Skip to content

feat: Add array data type support#433

Open
charlesdong1991 wants to merge 14 commits intoapache:mainfrom
charlesdong1991:issue-386
Open

feat: Add array data type support#433
charlesdong1991 wants to merge 14 commits intoapache:mainfrom
charlesdong1991:issue-386

Conversation

@charlesdong1991
Copy link
Contributor

Purpose

Linked issue: close #386

Brief change log

Add end-to-end ARRAY column support in fluss-rust

Tests

Yes all tests are passed locally

let (val, next) = reader.read_long(cursor);
let decimal = Decimal::from_unscaled_long(val, precision, scale)
.expect("Failed to create decimal from unscaled long");
let decimal =
Copy link
Contributor Author

@charlesdong1991 charlesdong1991 Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should convert all expect to typed errors so malformed array payload handling does not leave panic backdoor

// silently widen key semantics.
if matches!(
field_type,
DataType::Array(_) | DataType::Map(_) | DataType::Row(_)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this applies to all array/map/row, so i understand this PR only adds Array, but i think better to add all of them.
if there is objection, i can remove

@luoyuxia
Copy link
Contributor

luoyuxia commented Mar 8, 2026

@charlesdong1991 Thanks for the great pr. I'll find some time to review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end ARRAY column support to the Fluss Rust client, including a Java-compatible binary array representation and integration across row encoders/decoders, Arrow interop, and public APIs.

Changes:

  • Introduces FlussArray / FlussArrayWriter (BinaryArray layout) and wires it into Datum, InternalRow, and runtime field/value writers.
  • Adds compacted row read/write support for arrays (length-prefixed BinaryArray bytes) and rejects ARRAY as a key column type.
  • Extends Arrow conversion logic and updates user/docs + tests for primitive, nullable, empty, and nested arrays.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
website/docs/user-guide/rust/data-types.md Documents ARRAY<T> type mapping and how to construct/read arrays.
website/docs/user-guide/rust/api-reference.md Adds InternalRow::get_array and documents FlussArray API.
crates/fluss/src/row/mod.rs Exposes binary_array module and re-exports FlussArray; adds InternalRow::get_array.
crates/fluss/src/row/field_getter.rs Adds Array field getter + tests.
crates/fluss/src/row/encode/compacted_key_encoder.rs Adds test ensuring array types are rejected in key encoding.
crates/fluss/src/row/datum.rs Adds Datum::Array and Arrow ListBuilder append/conversion logic.
crates/fluss/src/row/compacted/compacted_row_writer.rs Adds write_array (delegates to length-prefixed bytes).
crates/fluss/src/row/compacted/compacted_row_reader.rs Makes deserialization fallible and adds array decoding via FlussArray::from_bytes.
crates/fluss/src/row/compacted/compacted_row.rs Propagates fallible deserialization and adds InternalRow::get_array + array tests.
crates/fluss/src/row/compacted/compacted_key_writer.rs Explicitly rejects complex key types and adds write_array to the delegated writer surface.
crates/fluss/src/row/column.rs Implements ColumnarRow::get_array by converting Arrow ListArrayFlussArray; adds tests.
crates/fluss/src/row/binary_array.rs New Java-compatible binary array implementation + writer + tests.
crates/fluss/src/row/binary/binary_writer.rs Adds BinaryWriter::write_array and InnerValueWriter::Array.
crates/fluss/src/record/arrow.rs Adds Arrow List builder support and from_arrow_type mapping for list element conversion.
bindings/cpp/src/types.rs Propagates Datum::Array handling in type resolution and row ownership conversion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 LGTM overall, left minor comments
If you fixed smth in the meantime - good, sorry for the noise, as I started reviewing before the new code :)

@charlesdong1991
Copy link
Contributor Author

Hi @fresh-borzoni thanks a lot!
It would be great if you can give another round of reviews since i made some changes during your review ^^

Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Left some comments. PTAL

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 TY, looked through again, left comments
PTAL

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 TY, Looked through, LGTM overall, one nit comment 👍

@charlesdong1991
Copy link
Contributor Author

Thanks both! PTAL @fresh-borzoni @leekeiabstraction @luoyuxia

@leekeiabstraction
Copy link
Contributor

leekeiabstraction commented Mar 11, 2026

Test case changes looks good to me but diff seems larger, seems to be squashed with is retriable changes from another PR? Not sure if it will conflict. LGTM otherwise.

Edit: for some reason I mistaken the diff between commits as overall diff.

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 TY, LGTM

Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. TY for the PR!

@luoyuxia
Copy link
Contributor

luoyuxia commented Mar 15, 2026

@charlesdong1991 Hi, is it possible to add IT for covering the support for array? Or we can merge this first, but not expose it to user in document util we add it to make sure it work.
I don't mind add it in another pr (also fix any possible issue found by IT )since the pr is already to big to review.


Element getters mirror `InternalRow` typed getters and return `Result<T>`. For example, use `get_int()`, `get_long()`, and `get_double()` for primitive elements, and `get_string()`, `get_binary()`, `get_decimal()`, `get_timestamp_ntz()`, `get_timestamp_ltz()`, and `get_array()` for variable-length or nested elements.

TODO: `FlussArray` currently exposes the fallible getter surface as the stable API. Infallible fast-path variants may be added later as non-breaking extensions if needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why left todo in user-face document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was added to address a comment in Copilot: 66359e4 for the design decision we have now that fluss array only has fallible getters now, and we might add infallible variants in future as non-breaking change.

But i think indeed, it's confusing to leave TODO in user facing doc, and let me just remove this part completely, and move the note to codebase instead @luoyuxia

@charlesdong1991
Copy link
Contributor Author

Hi @luoyuxia

if you don't mind, i'd prefer to have integration tests part in a separate PR, and i created this issue to track: #441 and i can work on it.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Array data type support in Rust

5 participants