-
Notifications
You must be signed in to change notification settings - Fork 13
Add xattr and chunked_format documentation #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
SToPire
wants to merge
3
commits into
erofs:main
Choose a base branch
from
SToPire:to-be-merge-xattr
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,135 @@ | ||
| (chunk_based_format)= | ||
| # Chunk-based File Format | ||
|
|
||
| The chunk-based file format allows large files to be split into fixed-size chunks, | ||
| each backed by consecutive physical blocks. This enables efficient deduplication | ||
| and multi-device storage by addressing each chunk independently. | ||
|
|
||
| ## Superblock Fields for Chunked Files | ||
|
|
||
| The core superblock format is defined in {ref}`on_disk_superblock`. This | ||
| section lists the extended fields dedicated to chunk-based file support and | ||
| multi-device addressing features. | ||
|
|
||
| | Offset | Size | Type | Name | Description | | ||
| |--------|------|-------|--------------------|-------------| | ||
| | 0x50 | 4 | `u32` | `feature_incompat` | `EROFS_FEATURE_INCOMPAT_CHUNKED_FILE` enables chunk-based inodes. `EROFS_FEATURE_INCOMPAT_DEVICE_TABLE` enables the external device table described in {ref}`device_table` | | ||
| | 0x56 | 2 | `u16` | `extra_devices` | Number of external devices. Valid when `EROFS_FEATURE_INCOMPAT_DEVICE_TABLE` is set | | ||
| | 0x58 | 2 | `u16` | `devt_slotoff` | Start slot offset of the external device table in the metadata area. Valid when `EROFS_FEATURE_INCOMPAT_DEVICE_TABLE` is set | | ||
|
|
||
| ## Inode Fields for Chunked Files | ||
|
|
||
| The full compact and extended inode layouts are defined in {ref}`on_disk_inodes`. | ||
| For chunk-based files, both inode variants share the same feature-specific | ||
| fields: | ||
|
|
||
| | Offset | Size | Type | Name | Description | | ||
| |--------|------|-------|------------|-------------| | ||
| | 0x00 | 2 | `u16` | `i_format` | Bits 1-3 encode `EROFS_INODE_CHUNK_BASED` (4) | | ||
| | 0x10 | 4 | `u32` | `i_u` | Chunk info record described below | | ||
|
|
||
| ## Inode Data Layout for Chunked Files | ||
|
|
||
| The data layout of an inode is encoded in bits 1–3 of `i_format`. | ||
|
|
||
| ### `EROFS_INODE_CHUNK_BASED` (4) | ||
|
|
||
| The entire inode data is split into fixed-size chunks, each occupying consecutive | ||
| physical blocks. Requires `EROFS_FEATURE_INCOMPAT_CHUNKED_FILE`. `i_u` encodes a | ||
| chunk info record and an array of per-chunk address entries follows the inode body. | ||
|
|
||
| (chunk_based_structures)= | ||
| ## Chunk-based Structures | ||
|
|
||
| When the data layout is `EROFS_INODE_CHUNK_BASED`, the `i_u` field (4 bytes at | ||
| inode offset 0x10) is interpreted as a chunk info record: | ||
|
|
||
| ### Chunk Info Record | ||
|
|
||
| | Bits | Width | Description | | ||
| |-------|-------|-------------| | ||
| | 0–4 | 5 | `chunkbits`: chunk size = `2^(blkszbits + chunkbits)` | | ||
| | 5 | 1 | `EROFS_CHUNK_FORMAT_INDEXES`: entry format selector (see below) | | ||
| | 6 | 1 | 48-bit layout specific; ignored for normal chunk-based inodes | | ||
| | 7–31 | 25 | Reserved; must be 0 | | ||
|
|
||
| An array of per-chunk address entries is stored immediately after the inode body | ||
| (and inline xattr region, if any). The number of entries is | ||
| `⌈i_size / chunk_size⌉`. | ||
|
|
||
| ### Chunk Entry Formats | ||
|
|
||
| The `EROFS_CHUNK_FORMAT_INDEXES` bit in the chunk info record selects one of two | ||
| per-chunk entry formats: | ||
|
|
||
| #### Block Map Entry (4 bytes) | ||
|
|
||
| When `EROFS_CHUNK_FORMAT_INDEXES` is not set, each chunk is described by a | ||
| single 32-bit block address entry. | ||
|
|
||
| Without a device table, this entry is a primary-device block address. | ||
| With `EROFS_FEATURE_INCOMPAT_DEVICE_TABLE`, it is interpreted in | ||
| the unified address space and can resolve to either the primary or an extra device. | ||
|
|
||
| | Offset | Size | Type | Name | Description | | ||
| |--------|------|-------|------------|-------------| | ||
| | 0x00 | 4 | `u32` | `startblk` | Starting block address of this chunk | | ||
|
|
||
| #### Chunk Index Entry (8 bytes) | ||
|
|
||
| When `EROFS_CHUNK_FORMAT_INDEXES` is set, each chunk is described by an 8-byte | ||
| record that supports multi-device addressing. | ||
|
|
||
| | Offset | Size | Type | Name | Description | | ||
| |--------|------|-------|-------------|-------------| | ||
| | 0x00 | 2 | `u16` | _dontcare_ | 48-bit layout specific; ignored for normal chunk-based inodes | | ||
| | 0x02 | 2 | `u16` | `device_id` | External device index; 0 = primary device | | ||
| | 0x04 | 4 | `u32` | `startblk` | 32-bit starting block address | | ||
|
|
||
| The `device_id` indexes into the external device table to resolve the physical | ||
| device and unified address offset. See | ||
| {ref}`address-resolution-for-chunk-based-inodes`. | ||
|
|
||
| (multi_device_support)= | ||
| ## Multi-device Support | ||
|
|
||
| (device_table)= | ||
| ### Device Table | ||
|
|
||
| This section applies when `EROFS_FEATURE_INCOMPAT_DEVICE_TABLE` is set. | ||
|
|
||
| When `EROFS_FEATURE_INCOMPAT_DEVICE_TABLE` is set, the `extra_devices` superblock | ||
| field gives the number of additional block devices. They are described by an array | ||
| of 128-byte device slot records. The array begins at slot offset `devt_slotoff` | ||
| within the metadata area, where each slot is 128 bytes. | ||
|
|
||
| Each record contains: | ||
|
|
||
| | Offset | Size | Type | Name | Description | | ||
| |--------|------|--------|------------|-------------| | ||
| | 0x00 | 64 | `u8[]` | `tag` | Device identifier (e.g. SHA-256 digest); not null-terminated | | ||
| | 0x40 | 4 | `u32` | `blocks` | 32-bit total block count of this device | | ||
| | 0x44 | 4 | `u32` | `uniaddr` | 32-bit unified starting block address of this device | | ||
| | 0x48 | 6 | `u8[]` | _dontcare_ | 48-bit layout specific; ignored for normal chunk-based inodes | | ||
| | 0x4E | 50 | `u8[]` | _reserved_ | Reserved; must be 0 | | ||
|
|
||
| (address-resolution-for-chunk-based-inodes)= | ||
| ### Address Resolution for Chunk-based Inodes | ||
|
|
||
| #### Chunk Index Entry (8 bytes) | ||
|
|
||
| When the chunk index entry format is used, each entry carries an explicit | ||
| `device_id` that directly identifies the target device: | ||
|
|
||
| - `device_id = 0`: `startblk` is a block address on the primary device. | ||
| - `device_id = N` (1 ≤ N ≤ `extra_devices`): `startblk` is a block address on | ||
| extra device N, whose slot is at `devt_slotoff + (N − 1)` in the metadata area. | ||
|
|
||
| #### Block Map Entry (4 bytes) | ||
|
|
||
| The simple block map entry format has no `device_id` field. Instead, `startblk` | ||
| is an absolute block address in the unified address space, and the reader uses | ||
| `uniaddr` from the device table to identify the target device: it finds the | ||
| slot `i` whose range `[uniaddr[i], uniaddr[i] + blocks[i])` contains | ||
| `startblk`, then derives the intra-device block address as | ||
| `startblk − uniaddr[i]`. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,203 @@ | ||
| (xattrs)= | ||
| # Extended Attributes (Xattrs) | ||
|
|
||
| EROFS supports Extended Attributes | ||
| ([xattr(7)](https://man7.org/linux/man-pages/man7/xattr.7.html)) which are | ||
| `name:value` pairs associated permanently with inodes since the initial Linux | ||
| 5.4 version. | ||
|
|
||
| ## Superblock Fields for Xattr Support | ||
|
|
||
| The core superblock format is defined in {ref}`on_disk_superblock`. This | ||
| section lists the extended fields dedicated to the xattr features. | ||
|
|
||
| | Offset | Size | Type | Name | Description | | ||
| |--------|------|-------|--------------------------|-------------| | ||
| | 0x08 | 4 | `u32` | `feature_compat` | Xattr-related compatible feature flags; see {ref}`xattr_feature_flags` | | ||
| | 0x2C | 4 | `u32` | `xattr_blkaddr` | Start block address of the shared xattr area; see {ref}`shared_xattr_area` | | ||
| | 0x50 | 4 | `u32` | `feature_incompat` | Xattr-related incompatible feature flags; see {ref}`xattr_feature_flags` | | ||
| | 0x5B | 1 | `u8` | `xattr_prefix_count` | Number of long xattr name prefixes; valid when `EROFS_FEATURE_INCOMPAT_XATTR_PREFIXES` is set; see {ref}`long_xattr_prefixes` | | ||
| | 0x5C | 4 | `u32` | `xattr_prefix_start` | Location of the standalone long xattr prefix table when `EROFS_FEATURE_INCOMPAT_XATTR_PREFIXES` and `EROFS_FEATURE_COMPAT_PLAIN_XATTR_PFX` are both set; see {ref}`long_xattr_prefixes` | | ||
| | 0x60 | 8 | `u64` | `packed_nid` | Packed inode NID. Relevant when long xattr prefixes are embedded in the packed inode's data region | | ||
| | 0x68 | 1 | `u8` | `xattr_filter_reserved` | Must be 0 for the xattr Bloom filter to operate; see {ref}`xattr_filter` | | ||
| | 0x69 | 1 | `u8` | `ishare_xattr_prefix_id` | Long-prefix table index used by image-share xattrs; valid when `EROFS_FEATURE_COMPAT_ISHARE_XATTRS` is set; see {ref}`image_share_xattrs` | | ||
| | 0x80 | 8 | `u64` | `metabox_nid` | Metabox inode NID. Relevant when shared xattrs or long xattr prefixes are stored in the metabox inode's data region | | ||
|
|
||
| (xattr_feature_flags)= | ||
| ## Xattr Feature Flags | ||
|
|
||
| The following superblock feature bits are directly relevant to xattr support. | ||
|
|
||
| ### `feature_compat` Bits | ||
|
|
||
| | Bit mask | Name | Description | | ||
| |--------------|---------------------------------------------|-------------| | ||
| | `0x00000020` | `EROFS_FEATURE_COMPAT_XATTR_FILTER` | Enables the per-inode xattr Bloom filter described in {ref}`xattr_filter` | | ||
| | `0x00000040` | `EROFS_FEATURE_COMPAT_PLAIN_XATTR_PFX` | Stores the long xattr prefix table as a standalone region; valid when `EROFS_FEATURE_INCOMPAT_XATTR_PREFIXES` is set; see {ref}`long_xattr_prefixes` | | ||
| | `0x00000080` | `EROFS_FEATURE_COMPAT_SHARED_EA_IN_METABOX` | Stores the shared xattr area in the metabox inode's decoded data region instead of at `xattr_blkaddr` | | ||
| | `0x00000100` | `EROFS_FEATURE_COMPAT_ISHARE_XATTRS` | Enables image-share xattrs and makes `ishare_xattr_prefix_id` valid; see {ref}`image_share_xattrs` | | ||
|
|
||
| ### `feature_incompat` Bits | ||
|
|
||
| | Bit mask | Name | Description | | ||
| |--------------|--------------------------------------|-------------| | ||
| | `0x00000020` | `EROFS_FEATURE_INCOMPAT_XATTR_PREFIXES` | Enables the long xattr prefix table described in {ref}`long_xattr_prefixes` | | ||
|
|
||
| (xattr_inode_fields)= | ||
| ## Inode Fields for Xattr Support | ||
|
|
||
| The full compact and extended inode layouts are defined in {ref}`on_disk_inodes`. | ||
| For xattr support, both inode variants share the same feature-specific field: | ||
|
|
||
| | Offset | Size | Type | Name | Description | | ||
| |--------|------|-------|------------------|-------------| | ||
| | 0x02 | 2 | `u16` | `i_xattr_icount` | When non-zero, the inline xattr region size is `(i_xattr_icount - 1) * 4 + 12` bytes | | ||
SToPire marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Two storage classes exist: | ||
|
|
||
| - **Inline xattrs**: stored directly in the metadata block immediately following | ||
| the inode body (and any inline data tail). They are private to the inode and | ||
| encoded as a sequence of xattr entry records within the inline xattr region | ||
| described by `i_xattr_icount`. | ||
| - **Shared xattrs**: stored once in the global shared xattr area. An inode references | ||
| a shared entry through a 4-byte index stored immediately after the fixed | ||
| 12-byte inline xattr header. | ||
| Multiple inodes that carry an identical xattr (same name and value) can reference | ||
| the same shared entry, avoiding redundant per-inode copies. | ||
|
|
||
| To further reduce storage overhead for xattrs whose names share a common prefix | ||
| (for example `trusted.overlay.*` or `security.ima.*`), EROFS supports a long xattr | ||
| prefix table (`EROFS_FEATURE_INCOMPAT_XATTR_PREFIXES`). Each prefix entry records a | ||
| `base_index` that refers to one of the standard short namespace prefixes and an | ||
| additional infix string. An xattr entry whose `e_name_index` selects a long prefix | ||
| stores only the suffix that follows the full reconstructed prefix, rather than | ||
| repeating the prefix in every entry. | ||
|
|
||
| (inline_xattr_region)= | ||
| ## Inline Xattr Region Layout | ||
|
|
||
| The inline xattr region immediately follows the inode body (at a 4-byte aligned | ||
| offset). Its total size is `(i_xattr_icount - 1) * 4 + 12` bytes, where 12 is the | ||
| fixed size of the inline xattr body header. The region is structured as: | ||
|
|
||
| 1. The {ref}`inline_xattr_body_header` (12 bytes, fixed). | ||
| 2. `h_shared_count` × 4-byte shared xattr index values. | ||
| 3. Zero or more {ref}`xattr_entry_record` (inline entries). | ||
|
|
||
| (inline_xattr_body_header)= | ||
| ### Inline Xattr Body Header | ||
| EROF | ||
| | Offset | Size | Type | Name | Description | | ||
| |--------|------|--------|------------------|-------------| | ||
| | 0x00 | 4 | `u32` | `h_name_filter` | Inverted Bloom filter over xattr names; valid when `EROFS_FEATURE_COMPAT_XATTR_FILTER` is set; see {ref}`Xattr Filter <xattr_filter>` | | ||
| | 0x04 | 1 | `u8` | `h_shared_count` | Number of shared xattr index entries | | ||
| | 0x05 | 7 | `u8[]` | _reserved_ | Reserved; must be 0 | | ||
|
|
||
| (xattr_entry_record)= | ||
| ### Xattr Entry Record | ||
|
|
||
| Each inline or shared xattr entry has the following layout: | ||
|
|
||
| | Offset | Size | Type | Name | Description | | ||
| |--------|------|-------|----------------|-------------| | ||
| | 0x00 | 1 | `u8` | `e_name_len` | Length of the name suffix in bytes | | ||
| | 0x01 | 1 | `u8` | `e_name_index` | Namespace index (maps to a prefix string); see below | | ||
| | 0x02 | 2 | `u16` | `e_value_size` | Length of the value in bytes | | ||
|
|
||
| Immediately following the 4-byte xattr entry header: `e_name_len` bytes of name | ||
| suffix, then `e_value_size` bytes of value. The entire entry (header + name + value) | ||
| is padded to a 4-byte boundary. | ||
|
|
||
| (e_name_index-namespace-mapping)= | ||
| #### `e_name_index` Namespace Mapping | ||
|
|
||
| Rather than storing the full namespace prefix string in every entry, EROFS encodes | ||
| the xattr namespace prefix as a 1-byte index. Bit 7 of `e_name_index` is the | ||
| `EROFS_XATTR_LONG_PREFIX` flag. When set, the lower 7 bits index into the long | ||
| xattr name prefix table (see {ref}`long_xattr_prefixes`). | ||
| When clear, the full byte selects one of the built-in short namespace prefixes: | ||
|
|
||
| | Value | Prefix | | ||
| |-------|--------| | ||
| | 1 | `user.` | | ||
| | 2 | `system.posix_acl_access` | | ||
| | 3 | `system.posix_acl_default` | | ||
| | 4 | `trusted.` | | ||
| | 6 | `security.` | | ||
|
|
||
SToPire marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| (shared_xattr_area)= | ||
| ## Shared Xattr Area | ||
|
|
||
| Normally, the shared xattr area begins at block address `xattr_blkaddr`. Each shared | ||
| entry is an xattr entry record stored contiguously in this area. An inode references | ||
| a shared entry by its 32-bit index, stored in the inline xattr region immediately | ||
| after the 12-byte inline xattr header. The index is a byte offset within the | ||
| shared area divided by 4. | ||
|
|
||
| When `EROFS_FEATURE_COMPAT_SHARED_EA_IN_METABOX` is set, the shared xattr pool is | ||
| stored in the metabox inode's decoded data region rather than at `xattr_blkaddr`. | ||
|
|
||
| (long_xattr_prefixes)= | ||
| ## Long Xattr Name Prefixes | ||
|
|
||
| This section applies when `EROFS_FEATURE_INCOMPAT_XATTR_PREFIXES` is set. | ||
|
|
||
| When this feature is set, a table of `xattr_prefix_count` prefix entries is | ||
| present; see {ref}`xattr_prefix_table_placement` for where that table is stored. | ||
| Each entry has the following fixed header, padded together with the | ||
| variable-length `infix` payload to a 4-byte boundary: | ||
|
|
||
| | Offset | Size | Type | Name | Description | | ||
| |--------|------|-------|--------------|-------------| | ||
| | 0x00 | 2 | `u16` | `size` | Byte length of the following content: `base_index` plus the variable-length `infix` payload | | ||
| | 0x02 | 1 | `u8` | `base_index` | Built-in short namespace prefix index (see {ref}`e_name_index-namespace-mapping`) | | ||
|
|
||
| The variable-length `infix` bytes begin at offset `0x03`. Their length is | ||
| `size - 1`, and they are not null-terminated. | ||
|
|
||
| The full reconstructed prefix is the concatenation of the short prefix indicated by | ||
| `base_index` and the `infix` bytes. An xattr entry using a long prefix stores only | ||
| the name suffix after the reconstructed prefix; `e_name_len` counts only those suffix | ||
| bytes. | ||
|
|
||
| For example, an xattr named `trusted.overlay.opaque` can be represented with | ||
| `base_index = 4` (`trusted.`) and `infix = "overlay."`, yielding the full prefix | ||
| `trusted.overlay.`; the stored name suffix is `opaque` with `e_name_len = 6`. | ||
|
|
||
| (xattr_prefix_table_placement)= | ||
| ### Prefix Table Placement | ||
|
|
||
| The xattr prefix table may be: | ||
| - embedded in the metabox or packed inode's data region (when `EROFS_FEATURE_COMPAT_PLAIN_XATTR_PFX` is not set), or | ||
| - stored as a standalone region located by `xattr_prefix_start` (when `EROFS_FEATURE_COMPAT_PLAIN_XATTR_PFX` is set). | ||
|
|
||
| (xattr_filter)= | ||
| ## Xattr Filter | ||
|
|
||
| This section applies when `EROFS_FEATURE_COMPAT_XATTR_FILTER` is set. | ||
|
|
||
| When this feature is set, `h_name_filter` in the inline xattr body header holds a | ||
| 32-bit inverted Bloom filter over the inode's xattr names. Each bit position | ||
| corresponds to one hash bucket: | ||
|
|
||
| - A bit value of **1** guarantees that no xattr present on this inode hashes to that | ||
| bucket, so the queried name is **definitely absent**. | ||
| - A bit value of **0** means a matching xattr **may exist** and a full scan is | ||
| required. | ||
|
|
||
| When `xattr_filter_reserved` in the superblock is non-zero, the Bloom filter is | ||
| disabled unconditionally for all inodes in the image. | ||
|
|
||
| (image_share_xattrs)= | ||
| ## Image-share Xattrs | ||
|
|
||
| This section applies when `EROFS_FEATURE_COMPAT_ISHARE_XATTRS` is set. | ||
|
|
||
| When this feature is set, the superblock field `ishare_xattr_prefix_id` is valid | ||
| and identifies an entry in the long xattr prefix table. Regular files may carry an | ||
| xattr whose name equals the prefix identified by `ishare_xattr_prefix_id` | ||
| (i.e. `e_name_index` selects that entry and `e_name_len` is 0) and whose value is | ||
| a SHA-256 content fingerprint in the form `sha256:<hex-digest>`. | ||
|
|
||
| This convention enables tools to identify files with identical content across | ||
| different EROFS images by comparing these fingerprints. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.