Add support for structured dtypes to `zarr3` driver, open structs as void by BrianMichell · Pull Request #271 · google/tensorstore

BrianMichell · 2026-01-05T15:22:04Z

Supersedes #264

Resolves comments 1 and 2

…river

…t for raw bits dtype

Implement shim for `open_as_void` driver level flag

* Begin removing void field shim * Fully removed void string shim * Cleanup debug prints * Remove shimmed validation * Remove unnecessary comment * Prefer false over zero for ternary clarity

* Implement a more general and portable example set * Fix driver cache bug * Update example for template * Cleanup example * Remove testing examples from source

* Use the appropriate fill value for open_as_void structured data * Cleanup

laramiel · 2026-01-22T23:18:00Z

I'll try to get to this in about a week, before I look this one over, please double check that the prior PR works for you. Also look over this one and see if any of the suggestions from the other one applies.

Matches the pattern from zarr v2 driver (PR google#272). When both "field" and "open_as_void" are specified in the spec, return an error since these options are mutually exclusive - field selects a specific field from a structured array, while open_as_void provides raw byte access to the entire structure.

The zarr3 URL syntax cannot represent field selection or void access mode. Following the pattern from zarr v2 driver (PR google#272), ToUrl() now returns an error when either of these options is specified instead of silently ignoring them.

…trip Following the pattern from zarr v2 driver (PR google#272), override GetBoundSpecData in ZarrDataCache to set spec.open_as_void from ChunkCacheImpl::open_as_void_. This ensures that when you open a store with open_as_void=true and then call spec(), the resulting spec correctly has open_as_void=true set. Without this fix, opening a store with open_as_void=true and then getting its spec would lose the open_as_void flag, causing incorrect behavior if the spec is used to re-open the store.

laramiel · 2026-03-17T18:14:57Z

Due to limitations on our copybara import settings, is it possible for you to squash your commits into a single commit?

tensorstore/driver/zarr3/chunk_cache.cc

tensorstore/driver/zarr3/driver.cc

laramiel · 2026-03-17T20:04:04Z

tensorstore/driver/zarr3/driver_test.cc

+  // Element [0,0] should be 0xAABBCCDD
+  EXPECT_EQ(static_cast<int32_t>(0xAABBCCDD), typed_ptr[0]);
+}
+


Probably ought to test edge cases more. And more constraints.

TEST(Zarr3OpenAsVoidTest, OpenAsVoidWithRankConstraint) { // Create an array with rank 2. ::nlohmann::json create_spec{ {"driver", "zarr3"}, {"kvstore", {{"driver", "memory"}, {"path", "prefix/"}}}, {"metadata", { {"data_type", "int32"}, {"shape", {4, 4}}, {"chunk_grid", {{"name", "regular"}, {"configuration", {{"chunk_shape", {2, 2}}}}}}, }}, }; // Attempt to open with open_as_void=true and rank=3 constraint. ::nlohmann::json open_as_void_spec{ {"driver", "zarr3"}, {"kvstore", {{"driver", "memory"}, {"path", "prefix/"}}}, {"open_as_void", true}, {"rank", 3}, }; // ... create and then open. // Should this pass or fail? } TEST(Zarr3OpenAsVoidTest, OpenAsVoidWithMetadataShape) { ::nlohmann::json create_spec{ {"driver", "zarr3"}, {"kvstore", {{"driver", "memory"}, {"path", "prefix/"}}}, {"metadata", { {"data_type", "int32"}, {"shape", {4, 4}}, {"chunk_grid", {{"name", "regular"}, {"configuration", {{"chunk_shape", {2, 2}}}}}}, }}, }; ::nlohmann::json open_as_void_spec{ {"driver", "zarr3"}, {"rank", 3}, {"kvstore", {{"driver", "memory"}, {"path", "prefix/"}}}, {"open_as_void", true}, {"metadata", {{"shape", {4, 4}}}}, }; // ... create and then open. EXPECT_THAT(void_store.domain().shape(), ElementsAre(4, 4, 4)); }

I don't see these; I've suggested other variants above.

laramiel · 2026-03-17T20:10:20Z

tensorstore/driver/zarr3/driver_test.cc

       {"kvstore", {{"driver", "memory"}, {"path", "abc.zarr3/def/"}}}});
 }

+// Tests for open_as_void functionality


Where are the structured tests which are independent of open_as_void?

There should be tests for read/write of structures with fields, nested structs, etc. Kind of like this:

TEST(Zarr3StructuredTest, StructuredFieldWithArrayField) { // dtype: { "x": int32[2, 2] } ::nlohmann::json create_spec{ {"driver", "zarr3"}, {"rank", 4}, {"field", "x"}, {"kvstore", {{"driver", "memory"}, {"path", "prefix/"}}}, {"metadata", { {"data_type", {{"name", "structured"}, {"configuration", {{"fields", {{"x", "int32", {2, 2}}}}}}}}, {"shape", {4, 2}}, {"chunk_grid", {{"name", "regular"}, {"configuration", {{"chunk_shape", {2, 2}}}}}}, }}, }; TENSORSTORE_ASSERT_OK_AND_ASSIGN( auto store, tensorstore::Open(create_spec, context, tensorstore::OpenMode::create) .result()); auto data = tensorstore::MakeArray<int32_t>( {{{{1, 2}, {3, 4}}, {{5, 6}, {7, 8}}}, {{{9, 10}, {11, 12}}, {{13, 14}, {15, 16}}}, {{{17, 18}, {19, 20}}, {{21, 22}, {23, 24}}}, {{{25, 26}, {27, 28}}, {{29, 30}, {31, 32}}}}); TENSORSTORE_ASSERT_OK(tensorstore::Write(data, store).result()); TENSORSTORE_ASSERT_OK_AND_ASSIGN(auto read_result, tensorstore::Read(store).result()); EXPECT_EQ(data, read_result); }

It looks like fields can no longer be array-like. But there still are array-like fields using the "r16" syntax, so an equivalent test should exist, right?

TEST(Zarr3DriverTest, StructuredFieldWithFieldShape) { // Test reading and writing with a structured field that has a field_shape auto context = Context::Default(); // Struct layout: a (int32, 4 bytes) + b (r16, 2 bytes/elements [2]) = 6 bytes ::nlohmann::json create_spec{ {"driver", "zarr3"}, {"kvstore", {{"driver", "memory"}, {"path", "prefix_field_shape/"}}}, {"field", "b"}, {"metadata", { {"data_type", {{"name", "struct"}, {"configuration", {{"fields", ::nlohmann::json::array( {{{"name", "a"}, {"data_type", "int32"}}, {{"name", "b"}, {"data_type", "r16"}}})}}}}}, {"shape", {4, 4}}, /* rank 2 */ {"chunk_grid", {{"name", "regular"}, {"configuration", {{"chunk_shape", {2, 2}}}}}}, }}, }; TENSORSTORE_ASSERT_OK_AND_ASSIGN( auto store, tensorstore::Open(create_spec, context, tensorstore::OpenMode::create, tensorstore::ReadWriteMode::read_write) .result()); // Field 'b' (r16) should have rank original_rank + 1 = 3 ASSERT_EQ(3, store.rank()); EXPECT_THAT(store.domain().shape(), ::testing::ElementsAre(4, 4, 2)); // Write some data to field b auto data = tensorstore::MakeArray<uint8_t>( {{{1, 2}, {3, 4}}, {{5, 6}, {7, 8}}}); TENSORSTORE_ASSERT_OK( tensorstore::Write( data, store | tensorstore::Dims(0, 1).SizedInterval({0, 0}, {2, 2}))); // Read it back TENSORSTORE_ASSERT_OK_AND_ASSIGN( auto read_data, tensorstore::Read(store | tensorstore::Dims(0, 1).SizedInterval({0, 0}, {2, 2})) .result()); EXPECT_EQ(data, read_data); }

Decode directly into byte array for open on void

tensorstore/driver/zarr3/driver_test.cc

…or zarr3 driver

laramiel · 2026-03-21T07:18:08Z

tensorstore/driver/zarr3/dtype_test.cc

+  EXPECT_EQ(output["configuration"]["fields"][0]["name"], "x");
+  EXPECT_EQ(output["configuration"]["fields"][0]["data_type"], "uint8");
+}
+


TEST(ParseDType, Raw64BitFieldShape) { ::nlohmann::json input = { {"name", "struct"}, {"configuration", {{"fields", ::nlohmann::json::array({{{"name", "x"}, {"data_type", "r64"}}})}}}}; // r64 results in field_shape [8] TENSORSTORE_ASSERT_OK_AND_ASSIGN(auto dtype, ParseDType(input)); ASSERT_EQ(dtype.fields.size(), 1); EXPECT_THAT(dtype.fields[0].field_shape, ::testing::ElementsAre(8)); EXPECT_EQ(dtype.fields[0].num_bytes, 8); } TEST(ParseDType, ManyFieldsOffsets) { // How many fields are allowed? ::nlohmann::json::array_t fields; for (int i = 0; i < 1000; ++i) { fields.push_back({{"name", absl::StrCat("f", i)}, {"data_type", "int64"}}); } ::nlohmann::json input = {{"name", "struct"}, {"configuration", {{"fields", fields}}}}; TENSORSTORE_ASSERT_OK_AND_ASSIGN(auto dtype, ParseDType(input)); EXPECT_EQ(dtype.fields.size(), 1000); EXPECT_EQ(dtype.bytes_per_outer_element, 8000); EXPECT_EQ(dtype.fields[99].byte_offset, 99 * 8); EXPECT_EQ(dtype.fields[999].byte_offset, 999 * 8); }

laramiel · 2026-03-21T07:20:54Z

tensorstore/driver/zarr3/dtype_test.cc

+          /*.bytes_per_outer_element=*/3,
+      });
+}
+


TEST(ParseDType, StructuredWithFieldShape) { ::nlohmann::json input = { {"name", "struct"}, {"configuration", {{"fields", ::nlohmann::json::array( {{{"name", "scalar"}, {"data_type", "int32"}}, {{"name", "array"}, {"data_type", "r16"}}})}}}}; ZarrDType expected{ /*.has_fields=*/true, /*.fields=*/ { {{ /*.encoded_dtype=*/"int32", /*.dtype=*/dtype_v<int32_t>, /*.flexible_shape=*/{}, }, /*.name=*/"scalar", /*.field_shape=*/{}, /*.num_inner_elements=*/1, /*.byte_offset=*/0, /*.num_bytes=*/4}, {{ /*.encoded_dtype=*/"r16", /*.dtype=*/dtype_v<tensorstore::dtypes::byte_t>, /*.flexible_shape=*/{2}, }, /*.name=*/"array", /*.field_shape=*/{2}, /*.num_inner_elements=*/2, /*.byte_offset=*/4, /*.num_bytes=*/2}, }, /*.bytes_per_outer_element=*/6, }; CheckDType(input, expected); }

laramiel · 2026-03-21T07:28:09Z

tensorstore/driver/zarr3/driver_test.cc

+                  absl::StatusCode::kInvalidArgument,
+                  ".*open_as_void is only supported for structured dtypes.*"));
+}
+


If someone provides constraints when using open_as_void, then they should be properly validated against the schema, right? Do you have a test equivalent to this? The errors below are approximations of what they should be.

TEST(Zarr3OpenAsVoidTest, InvalidSchema) { auto context = Context::Default(); // Create a structured array ::nlohmann::json create_spec{ {"driver", "zarr3"}, {"kvstore", {{"driver", "memory"}, {"path", "prefix_void_invalid/"}}}, {"field", "x"}, {"metadata", { {"data_type", {{"name", "struct"}, {"configuration", {{"fields", ::nlohmann::json::array( {{{"name", "x"}, {"data_type", "uint8"}}, {{"name", "y"}, {"data_type", "int16"}}})}}}}}, {"shape", {4, 4}}, {"chunk_grid", {{"name", "regular"}, {"configuration", {{"chunk_shape", {2, 2}}}}}}, }}, }; TENSORSTORE_ASSERT_OK_AND_ASSIGN( auto store, tensorstore::Open(create_spec, context, tensorstore::OpenMode::create, tensorstore::ReadWriteMode::read_write) .result()); ::nlohmann::json void_spec{ {"driver", "zarr3"}, {"kvstore", {{"driver", "memory"}, {"path", "prefix_void_invalid/"}}}, {"open_as_void", true}, }; // Test rank mismatch: RankConstraint should be 3 for this open_as_void: EXPECT_THAT( tensorstore::Open(void_spec, context, tensorstore::RankConstraint(2)) .result(), tensorstore::MatchesStatus(absl::StatusCode::kFailedPrecondition, "Rank specified by schema \\(2\\) does not match rank " "specified by metadata \\(3\\)")); // Test dtype mismatch: open_as_void dtype must be byte. EXPECT_THAT(tensorstore::Open(void_spec, context, dtype_v<int16_t>).result(), tensorstore::MatchesStatus(absl::StatusCode::kFailedPrecondition, "data_type from metadata \\(uint8\\) does not " "match dtype in schema \\(int16\\)")); }

laramiel · 2026-03-21T07:34:07Z

tensorstore/driver/zarr3/driver_test.cc

+    return bytes[i * s0 + j * s1 + k * s2];
+  };
+
+  // [0,0]: x=0 (fill), y=100 (0x0064 LE: 0x64, 0x00)


Some of these would be nicer as array comparisons, IMO. Why are you not reading from the array instead of computing the values directly? This applies to similar lines above and below here.

https://google.github.io/tensorstore/cpp/api/tensorstore.Array.operator-call-indices.html
https://google.github.io/tensorstore/cpp/api/tensorstore.Array.operator-subscript-indices.html

laramiel · 2026-03-21T07:38:48Z

tensorstore/driver/zarr3/driver_test.cc

+  auto expected_y = tensorstore::MakeArray<int16_t>({{200}});
+  EXPECT_EQ(expected_y, y_data);
+}
+


Do the above but also with r16 (or r64). This could maybe be a TEST_P.

laramiel · 2026-03-21T15:12:39Z

tensorstore/driver/zarr3/driver.cc

@@ -123,14 +131,48 @@ class ZarrDriverSpec
          "metadata",
          jb::Validate(
              [](const auto& options, auto* obj) -> absl::Status {


Move the validation into the jb::Initialize() call, so that all the checks happen in the same location here.

laramiel · 2026-03-21T15:15:57Z

tensorstore/driver/zarr3/dtype_test.cc

+}
+
+TEST(ParseDType, SimpleStringBool) {
+  CheckDType("bool", ZarrDType{


There are several places a ZarrDType with fields is fully spelled out.
I think that I'd like a method like:

AddFieldToZarrDType(dtype, Field{...})

Where the method updates bits like byte_offsets and bytes_per_outer_elements (depending on has_fields).

laramiel · 2026-03-21T15:25:55Z

tensorstore/driver/zarr3/metadata.cc

      FillValueDataTypeFunctions::Make<::tensorstore::dtypes::T>(); \
  /**/
      TENSORSTORE_ZARR3_FOR_EACH_DATA_TYPE(TENSORSTORE_INTERNAL_DO_DEF)
+      // Add char_t support for string data types


Nit #undef TENSORSTORE_INTERNAL_DO_DEF immediately after use.

laramiel · 2026-03-21T15:37:23Z

tensorstore/driver/zarr3/driver_test.cc

+          .result());
+
+  EXPECT_EQ(3, byte_array.rank());
+  EXPECT_EQ(2, byte_array.shape()[0]);


EXPECT_THAT(byte_array.shape(), ElementsAre(2, 2, 3));

laramiel · 2026-03-21T15:37:59Z

tensorstore/driver/zarr3/driver_test.cc

+  EXPECT_EQ(2, byte_array.shape()[1]);
+  EXPECT_EQ(3, byte_array.shape()[2]);
+
+  // Verify bytes - we use the array's data() and strides


EXPECT_EQ(tensorstore::MakeArray<uint8_t>( {{{0, 0x64, 0x00}, {0, 0xC8, 0x00}}, {{0, 0x2C, 0x01}, {0, 0x90, 0x01}}}), tensorstore::StaticCast<const uint8_t>(byte_array));

Though probably keep comments.

laramiel · 2026-03-21T15:48:27Z

tensorstore/driver/zarr3/metadata.h


 namespace tensorstore {
 namespace internal_zarr3 {



kVoidFieldIndex appears in more than one location; please use a single definition.

// Sentinel value for field_index indicating access to the raw byte "void" view // of a structured array. constexpr size_t kVoidFieldIndex = size_t(-1);

laramiel · 2026-03-21T16:24:18Z

tensorstore/driver/zarr3/driver.cc

+        if (obj->open_as_void && !obj->selected_field.empty()) {
+          return absl::InvalidArgumentError(
+              "\"field\" and \"open_as_void\" are mutually exclusive");
+        }


There should probably be a method for this. I'm using auto to avoid typing out types here.
And it's probably not quite correct.

TENSORSTORE_RETURN_IF_ERROR(TrySetMetadataConstraints(obj->metadata_constraints, obj->schema, obj->selected_field, obj->open_as_void)); absl::Status TrySetMetadataConstraints( const ZarrMetadataConstraints& metadata_constraints, auto& schema, auto selected_field, bool open_as_void) auto& metadata_constraints = obj->metadata_constraints; auto rank = metadata_constraints.rank; if (rank == dynamic_rank && metadata_constraints.shape) { rank = metadata_constraints.shape->size(); } auto add_rank = [&rank](auto n) { if (rank != dynamic_rank) rank += n; }; if (open_as_void) { add_rank (1); TENSORSTORE_RETURN_IF_ERROR(schema.Set(dtype_v<std::byte>)); } else if (metadata_constraints.dtype) { // get field index if (field_index) { field = ... get field from metadata ... TENSORSTORE_RETURN_IF_ERROR(schema.Set(field->dtype)); add_rank(field->field_shape.size()); } else { // not sure; shouldn't be void. } } if (rank != dynamic_rank) { TENSORSTORE_RETURN_IF_ERROR(obj->schema.Set( RankConstraint{rank})); } // ... maybe more? return absl::OkStatus(); }

Or maybe it should return something like a modified struct SpecRankAndFieldInfo?

BrianMichell added 24 commits November 24, 2025 20:55

Begin examining how to best add structured array support to Zarr v3 d…

44b4473

…river

Merge branch 'google:master' into v3_structs

e6df164

Updates to have proper reads

187f424

Local testing and examples

c2e73cd

Begin adding support for opening struct arrays as void and add suppor…

9e8ed94

…t for raw bits dtype

Fix failing tests

44c765e

Resolve issues with opening struct as void

547642d

Remove debug print

2a4c3d8

Add field for open as void

b0abb94

Add a shim for new open_as_void flag open option

fff0a5b

Revert some formatting changes

b6c24f9

revert gitignore changes

488b160

Merge pull request #1 from BrianMichell/v3_structs_compatibility

537d309

Implement shim for `open_as_void` driver level flag

V3 structs remove shim (#2)

54941a0

* Begin removing void field shim * Fully removed void string shim * Cleanup debug prints * Remove shimmed validation * Remove unnecessary comment * Prefer false over zero for ternary clarity

Fix structured fill value population

c9f58f9

V3 examples merge (#3)

7655cfd

* Implement a more general and portable example set * Fix driver cache bug * Update example for template * Cleanup example * Remove testing examples from source

Remove vestigial example build

8c4c4ca

V3 structs fix fills (#4)

4b590f8

* Use the appropriate fill value for open_as_void structured data * Cleanup

Merge branch 'google:master' into v3_structs

7691c83

Add new options to schema

c0082a0

Fix copyright header date

9a46c82

Cleanup (#5)

b9b5e41

Merge zarr3 structured dtype support

d5f6201

Remove default values

31e55ec

BrianMichell mentioned this pull request Jan 5, 2026

Add support for structured dtypes to zarr3 driver, open zarr 2 and 3 structs as void #264

Closed

BrianMichell added 4 commits January 23, 2026 15:05

Merge branch 'google:master' into v3_structs_and_void

aad0ee0

laramiel reviewed Mar 17, 2026

View reviewed changes

tensorstore/driver/zarr3/chunk_cache.cc Outdated Show resolved Hide resolved

laramiel reviewed Mar 17, 2026

View reviewed changes

tensorstore/driver/zarr3/driver.cc Outdated Show resolved Hide resolved

laramiel reviewed Mar 17, 2026

View reviewed changes

tensorstore/driver/zarr3/driver.cc Outdated Show resolved Hide resolved

laramiel reviewed Mar 17, 2026

View reviewed changes

tensorstore/driver/zarr3/driver.cc Show resolved Hide resolved

laramiel reviewed Mar 17, 2026

View reviewed changes

BrianMichell added 2 commits March 18, 2026 14:22

Resolve google#271 (comment)

1b71e73

Decode directly into byte array for open on void

Deslop

4f79bc9

laramiel reviewed Mar 18, 2026

View reviewed changes

tensorstore/driver/zarr3/driver_test.cc Outdated Show resolved Hide resolved

BrianMichell added 8 commits March 18, 2026 18:16

Resolve google#271 (comment) and update tests to resolve google#289 f…

3c8590a

…or zarr3 driver

Expand sharding tests for struct dtype and fix errors

5314706

Consolidate redundent code

d5f9913

Add TODO for open as void validation

f7a0d72

Update fixture names

c649713

Remove accidental commit to zarr driver test

c01106f

Remove improper shape parameter support

ada7d09

Merge branch 'master' into v3_structs_and_void

2e861e1

laramiel reviewed Mar 21, 2026

View reviewed changes

Conversation

BrianMichell commented Jan 5, 2026

Uh oh!

laramiel commented Jan 22, 2026

Uh oh!

laramiel commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laramiel Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laramiel Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laramiel Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

laramiel Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laramiel Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laramiel Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laramiel Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laramiel Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

laramiel commented Mar 17, 2026 •

edited

Loading

laramiel Mar 17, 2026 •

edited

Loading

laramiel Mar 17, 2026 •

edited

Loading

laramiel Mar 21, 2026 •

edited

Loading

laramiel Mar 21, 2026 •

edited

Loading

laramiel Mar 21, 2026 •

edited

Loading

laramiel Mar 21, 2026 •

edited

Loading

laramiel Mar 21, 2026 •

edited

Loading

laramiel Mar 21, 2026 •

edited

Loading