Feat : Row data type support in Rust by hemanthsavasere · Pull Request #442 · apache/fluss-rust

hemanthsavasere · 2026-03-15T19:08:03Z

Purpose

Linked issue: close #388

DataType Row was defined in the schema layer but the serialization stack was entirely missing. CompactedRowWriter and Reader panicked, FieldGetter and ValueWriter hit unimplemented macros, and Datum had no Row variant. This PR wires up the full stack.

Brief change log

Add Datum Row variant and as_row accessor.
Add get_row to InternalRow trait with a default error implementation.
Implement get_row on GenericRow and CompactedRow.
Implement ColumnarRow get_row via Arrow StructArray extraction with OnceLock caching per column, invalidated on set_row_id.
Add InnerValueWriter Row which serializes into a temporary CompactedRowWriter then calls write_bytes, matching the Java wire format.
Add DataType Row arm in CompactedRowDeserializer using read_bytes and recursive deserialize.
Add InnerFieldGetter Row which automatically enables ROW in CompactedKeyEncoder.
Handle Datum Row in C++ resolve_row_types.

Tests

test_row_simple_nesting: round-trip ROW of INT and STRING.
test_row_deep_nesting: round-trip ROW of ROW of INT.
test_row_with_nullable_fields: null field inside nested row and null outer ROW column.
test_row_as_primary_key: ROW through CompactedKeyEncoder; asserts non-empty, deterministic, and distinguishable output.
columnar_row_reads_nested_row, columnar_row_reads_deeply_nested_row, and columnar_row_get_row_cache_invalidated_on_set_row_id for Arrow StructArray extraction.

API and Format

API: Datum Row is a new variant. Exhaustive match on Datum will require a new arm. InternalRow get_row is additive with a default implementation.

Wire format: Unchanged. ROW uses varint-length plus CompactedRow blob, identical to String or Bytes. This matches the Java reference byte-for-byte.

Documentation

No new documentation needed.

- Add `Datum::Row(Box<GenericRow>)` variant with `as_row()` accessor - Add `get_row()` to `InternalRow` trait with default error impl - Implement `GenericRow::get_row()` and `CompactedRow::get_row()` delegation - Implement `ColumnarRow::get_row()` with Arrow StructArray extraction + OnceLock caching - Add `InnerValueWriter::Row(RowType)` and write path via nested CompactedRowWriter - Add `DataType::Row` arm in `CompactedRowDeserializer` for eager nested decode - Add `InnerFieldGetter::Row` and hook up FieldGetter/ValueWriter pipeline - Handle `Datum::Row` in `resolve_row_types` (C++ bindings) - Add round-trip tests: simple nesting, deep nesting, nullable fields, ROW as primary key Wire format matches Java: varint-length-prefixed blob of a complete CompactedRow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

hemanthsavasere · 2026-03-15T19:25:22Z

Hi @fresh-borzoni,
Can you please review the PR. Thanks

leekeiabstraction · 2026-03-16T19:32:05Z

@charlesdong1991 would be good if you can review this as well as you've worked on the array type.

leekeiabstraction

Thank you for your contribution! Left some comments/questions, PTAL.

leekeiabstraction · 2026-03-16T19:43:03Z

crates/fluss/src/row/binary/binary_writer.rs

            (InnerValueWriter::TimestampLtz(p), Datum::TimestampLtz(ts)) => {
                writer.write_timestamp_ltz(ts, *p);
            }
+            (InnerValueWriter::Row(row_type), Datum::Row(inner_row)) => {


Why don't we delegate like on Java side?

https://github.com/apache/fluss/blob/main/fluss-common/src/main/java/org/apache/fluss/row/BinaryWriter.java#L176

+1

currently, i think a new writer is created per write call, which is not ideal

leekeiabstraction · 2026-03-16T19:55:59Z

crates/fluss/src/row/column.rs

+
+        match array.data_type() {
+            ArrowDataType::Boolean => {
+                let a = array.as_any().downcast_ref::<BooleanArray>().unwrap();


Let's error with appropriate message instead of unwrapping/panic?

leekeiabstraction · 2026-03-16T20:05:49Z

crates/fluss/src/row/column.rs

        );
    }
+
+    fn make_struct_batch(


Is this for test only? If so, move under '#[test]'/mod?

leekeiabstraction · 2026-03-16T20:13:21Z

crates/fluss/src/row/column.rs

+
+        // Access outer struct at column 0, row 0
+        let outer = row.get_row(0).unwrap();
+        assert_eq!(outer.get_int(0).unwrap(), 1);


Not: should we assert second row as well?

leekeiabstraction · 2026-03-16T20:21:22Z

crates/fluss/src/row/compacted/compacted_row_reader.rs

+                        0,
+                        nested_bytes.len(),
+                    );
+                    let nested_deser = CompactedRowDeserializer::new_from_owned(row_type.clone());


Can we use from 'new' which borrows instead? Seems like nested_deser does not live beyond current scope anyway.

leekeiabstraction · 2026-03-16T20:29:55Z

Additionally, please can you add to existing integration test? TY

charlesdong1991

thanks for the PR! left couple comments

charlesdong1991 · 2026-03-17T21:05:29Z

crates/fluss/src/row/binary/binary_writer.rs

            (InnerValueWriter::TimestampLtz(p), Datum::TimestampLtz(ts)) => {
                writer.write_timestamp_ltz(ts, *p);
            }
+            (InnerValueWriter::Row(row_type), Datum::Row(inner_row)) => {


+1

currently, i think a new writer is created per write call, which is not ideal

charlesdong1991 · 2026-03-17T21:09:00Z

crates/fluss/src/row/binary/binary_writer.rs

                // Validation is done at TimestampLTzType construction time
                Ok(InnerValueWriter::TimestampLtz(t.precision()))
            }
+            DataType::Row(row_type) => Ok(InnerValueWriter::Row(row_type.clone())),


i think we should not store clone in innervaluewriter::row, probabl store in pre built child writer is better?

charlesdong1991 · 2026-03-17T21:11:22Z

crates/fluss/src/row/binary/binary_writer.rs

+                            InnerValueWriter::create_inner_value_writer(&field.data_type, None)
+                                .expect("create_inner_value_writer failed for nested row field");
+                        vw.write_value(&mut nested, i, datum)
+                            .expect("write_value failed for nested row field");
+                    }


i think in current way, it will panic inside a Result, does it mean those will be hidden for users?

charlesdong1991 · 2026-03-17T21:12:38Z

crates/fluss/src/row/column.rs

+
+        match array.data_type() {
+            ArrowDataType::Boolean => {
+                let a = array.as_any().downcast_ref::<BooleanArray>().unwrap();


charlesdong1991 · 2026-03-17T21:13:28Z

crates/fluss/src/row/binary/binary_writer.rs

+                let field_count = row_type.fields().len();
+                let mut nested = CompactedRowWriter::new(field_count);
+                for (i, field) in row_type.fields().iter().enumerate() {
+                    let datum = &inner_row.values[i];


potential panic on OOB?

charlesdong1991 · 2026-03-17T21:14:28Z

crates/fluss/src/row/column.rs

+        })?;
+        let batch = Arc::clone(&self.record_batch);
+        let row_id = self.row_id;
+        Ok(lock.get_or_init(|| {


maybe better to use get_or_try_init here?

leekeiabstraction · 2026-03-18T08:03:44Z

FYI: we're cutting a release branch soon hence we are taking time to review/merge. Just wanted contributors to stay informed about longer turnaround.

hemanthsavasere and others added 3 commits March 15, 2026 18:27

chore: remove thoughts/ from tracking and add to .gitignore

13a75d6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: restore .gitignore to match main

c5ca7c3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

hemanthsavasere changed the title ~~feat: add end-to-end ROW (nested struct) column serialization support~~ Row data type support in Rust Mar 15, 2026

hemanthsavasere changed the title ~~Row data type support in Rust~~ [Feat] Row data type support in Rust Mar 15, 2026

hemanthsavasere changed the title ~~[Feat] Row data type support in Rust~~ Feat : Row data type support in Rust Mar 15, 2026

leekeiabstraction reviewed Mar 16, 2026

View reviewed changes

charlesdong1991 reviewed Mar 17, 2026

View reviewed changes

Conversation

hemanthsavasere commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

hemanthsavasere commented Mar 15, 2026

Uh oh!

leekeiabstraction commented Mar 16, 2026

Uh oh!

leekeiabstraction left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leekeiabstraction commented Mar 16, 2026

Uh oh!

charlesdong1991 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leekeiabstraction commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hemanthsavasere commented Mar 15, 2026 •

edited

Loading