fix: proper listing of hottier by parmesant · Pull Request #1565 · parseablehq/parseable

parmesant · 2026-03-02T11:49:47Z

Fixes #XXXX.

Description

This PR has:

been tested to ensure log ingestion and log query works.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added documentation for new or modified features or behaviors.

Summary by CodeRabbit

New Features
- Hot-tier operations now support tenant-scoped paths, enabling per-tenant hot data access and isolation.
Refactor
- Improved hot-tier manifest retrieval and path resolution for more consistent, reliable lookups.
- Streamlined date and file discovery to consistently respect tenant context.
Bug Fixes
- More robust filtering of manifest/parquet files to avoid missing or incorrect entries.

coderabbitai · 2026-03-02T11:50:05Z

Walkthrough

Added tenant-scoped behavior to hot-tier operations by threading an optional tenant_id through multiple path/date helpers and manifest functions; refactored get_hot_tier_manifest_files to remove the stream parameter and adjusted call sites accordingly.

Changes

Cohort / File(s)	Summary
Multi-tenant path and date plumbing `src/hottier.rs`	Threaded `tenant_id: &Option<String>` through public and private helpers (e.g., `get_stream_path_for_date`, `fetch_hot_tier_dates`, `get_oldest_date_time_entry`, `get_hot_tier_parquet_files`, public APIs). Path resolution and date lookups now consider optional tenant scope; call sites updated to pass `tenant_id`.
Manifest retrieval and filtering `src/hottier.rs`	Refactored `get_hot_tier_manifest_files` (removed `stream` param) and updated manifest/parquet selection & existence checks to use tenant-aware paths and local file checks; adjusted manifest write paths to include tenant context.
Call-site and plan adjustments `src/query/stream_schema_provider.rs`	Updated call to new `get_hot_tier_manifest_files` signature, added explicit `Vec<File>` annotation for `hot_tier_files`, and simplified object store URL selection to `"file:///"` (removed tenant conditional).

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: delete hottier, query cache metric #1563: Modifies tenant-aware hot-tier path handling and related manifest/stream logic in src/hottier.rs.
fix hot tier path #1562: Adjusts hot-tier path resolution and existence checks for tenant-scoped vs global paths in src/hottier.rs.

Suggested reviewers

nikhilsinhaparseable

Poem

🐇
I hopped through folders, small and wide,
Tenant names tucked safely inside,
Manifests sorted by date and name,
Streams now find their scoped home game,
A little thump — hot-tier's tad refined! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description contains only template placeholders with no actual content filled in. The issue reference, goals, solutions, key changes, and testing status are all missing or unchecked.	Fill in the description with the actual issue being fixed, explain the goals and rationale for tenant-scoping changes, summarize key changes, and check off or explain the testing and documentation status.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: proper listing of hottier' is concise and clearly relates to the main change—fixing hot tier file listing behavior by introducing tenant-scoped operations.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/hottier.rs (1)

471-506: ⚠️ Potential issue | 🟡 Minor

Potential panic on malformed directory name.

Line 500 uses unwrap() on NaiveDate::parse_from_str. If a directory exists that doesn't follow the date=YYYY-MM-DD naming convention (e.g., a manually created folder or corrupted state), this will panic.

Also, line 482 contains dead commented code that should be removed.

Proposed fix to handle parse errors gracefully

-            // let path = self.hot_tier_path.join(stream);
         if !path.exists() {
             return Ok(date_list);
         }
 
         let directories = fs::read_dir(&path).await?;
         let mut dates = ReadDirStream::new(directories);
         while let Some(date) = dates.next().await {
             let date = date?;
             if !date.path().is_dir() {
                 continue;
             }
-            let date = NaiveDate::parse_from_str(
+            let Ok(date) = NaiveDate::parse_from_str(
                 date.file_name()
                     .to_string_lossy()
                     .trim_start_matches("date="),
                 "%Y-%m-%d",
-            )
-            .unwrap();
-            date_list.push(date);
+            ) else {
+                continue;
+            };
+            date_list.push(date);
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/hottier.rs` around lines 471 - 506, fetch_hot_tier_dates currently calls
NaiveDate::parse_from_str(...).unwrap(), which can panic if a directory name
doesn't match "date=YYYY-MM-DD"; replace the unwrap by handling the parse Result
(e.g., skip invalid entries with a warning via the function's error type or log
and continue) so malformed directories do not crash the function, and push only
successfully parsed NaiveDate values into date_list; also remove the dead
commented line that joins hot_tier_path with stream (the commented-out let path
= ... at the start of the function) to clean up the code.

🧹 Nitpick comments (1)

src/query/stream_schema_provider.rs (1)

281-286: Consider removing commented-out code.

The commented block (lines 281-285) describes a previous approach to tenant-based URL construction that is no longer used. Since LocalFS is single-tenant only and tenant_id is not used for path construction, this dead code could be removed to improve readability.

Proposed cleanup

-        // // NOTE: There is the possibility of a parquet file being pushed to object store
-        // // and deleted from staging in the time it takes for datafusion to get to it.
-        // // Staging parquet execution plan
-        // let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() {
-        //     &format!("file://{tenant_id}/")
-        // } else {
-        //     "file:///"
-        // };
+        // NOTE: There is the possibility of a parquet file being pushed to object store
+        // and deleted from staging in the time it takes for datafusion to get to it.
         let object_store_url = "file:///";

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/query/stream_schema_provider.rs` around lines 281 - 286, Remove the dead
commented-out tenant-based URL construction in stream_schema_provider.rs: delete
the commented block that references self.tenant_id and the formatted
"file://{tenant_id}/" lines, leaving the existing let object_store_url =
"file:///"; declaration; this cleans up unused code around object_store_url and
clarifies that LocalFS is single-tenant and tenant_id is not used for path
construction.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/hottier.rs`:
- Around line 471-506: fetch_hot_tier_dates currently calls
NaiveDate::parse_from_str(...).unwrap(), which can panic if a directory name
doesn't match "date=YYYY-MM-DD"; replace the unwrap by handling the parse Result
(e.g., skip invalid entries with a warning via the function's error type or log
and continue) so malformed directories do not crash the function, and push only
successfully parsed NaiveDate values into date_list; also remove the dead
commented line that joins hot_tier_path with stream (the commented-out let path
= ... at the start of the function) to clean up the code.

---

Nitpick comments:
In `@src/query/stream_schema_provider.rs`:
- Around line 281-286: Remove the dead commented-out tenant-based URL
construction in stream_schema_provider.rs: delete the commented block that
references self.tenant_id and the formatted "file://{tenant_id}/" lines, leaving
the existing let object_store_url = "file:///"; declaration; this cleans up
unused code around object_store_url and clarifies that LocalFS is single-tenant
and tenant_id is not used for path construction.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7b01e3 and 1545af7.

📒 Files selected for processing (2)

src/hottier.rs
src/query/stream_schema_provider.rs

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/query/stream_schema_provider.rs (1)

209-219: ⚠️ Potential issue | 🟡 Minor

Avoid unwrap() when converting hot-tier file paths during plan construction.

path.to_str().unwrap() can panic on non-UTF8 paths. Convert the closure to a loop to enable proper error propagation with DataFusionError.

🛠️ Proposed fix

-        let hot_tier_files: Vec<File> = hot_tier_files
-            .into_iter()
-            .map(|mut file| {
-                let path = PARSEABLE
-                    .options
-                    .hot_tier_storage_path
-                    .as_ref()
-                    .unwrap()
-                    .join(&file.file_path);
-                file.file_path = path.to_str().unwrap().to_string();
-                file
-            })
-            .collect();
+        let mut resolved_hot_tier_files: Vec<File> = Vec::with_capacity(hot_tier_files.len());
+        for mut file in hot_tier_files {
+            let path = PARSEABLE
+                .options
+                .hot_tier_storage_path
+                .as_ref()
+                .unwrap()
+                .join(&file.file_path);
+            let path_str = path.to_str().ok_or_else(|| {
+                DataFusionError::Execution(format!(
+                    "hot tier path is not valid UTF-8: {}",
+                    path.display()
+                ))
+            })?;
+            file.file_path = path_str.to_string();
+            resolved_hot_tier_files.push(file);
+        }
+        let hot_tier_files = resolved_hot_tier_files;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/query/stream_schema_provider.rs` around lines 209 - 219, The code
converts hot_tier_files inside a map closure and calls path.to_str().unwrap(),
which can panic on non-UTF8 paths; replace the map closure with an explicit loop
over hot_tier_files so you can return a Result and propagate errors using
DataFusionError (e.g., DataFusionError::Execution or a suitable variant) when
path.to_str() returns None or when hot_tier_storage_path is missing; update the
transformation of each File by joining
PARSEABLE.options.hot_tier_storage_path.as_ref() with file.file_path, attempt to
convert the joined Path to a UTF-8 string without unwrap, and return/propagate a
DataFusionError on failure instead of panicking.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/hottier.rs`:
- Around line 558-573: The current manifest_files.retain closure uses blocking
Path::exists and std::fs::metadata which runs inside an async query path;
replace that synchronous retain with an async-aware loop: iterate over
manifest_files (e.g., with a for loop), construct hot_tier_path for each file,
call tokio::fs::metadata(&hot_tier_path).await (handle Ok/Err), compare
meta.len() to file.file_size, push matching entries into hot_tier_files and skip
adding them to a new manifest_files vector, otherwise push them into the new
vector; finally replace manifest_files with the new vector. Update the code
around the manifest_files.retain closure in hottier.rs to use this async
iteration so Path::exists and std::fs::metadata are not used on the Tokio
runtime.

---

Outside diff comments:
In `@src/query/stream_schema_provider.rs`:
- Around line 209-219: The code converts hot_tier_files inside a map closure and
calls path.to_str().unwrap(), which can panic on non-UTF8 paths; replace the map
closure with an explicit loop over hot_tier_files so you can return a Result and
propagate errors using DataFusionError (e.g., DataFusionError::Execution or a
suitable variant) when path.to_str() returns None or when hot_tier_storage_path
is missing; update the transformation of each File by joining
PARSEABLE.options.hot_tier_storage_path.as_ref() with file.file_path, attempt to
convert the joined Path to a UTF-8 string without unwrap, and return/propagate a
DataFusionError on failure instead of panicking.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1545af7 and 8e31e8b.

📒 Files selected for processing (2)

src/hottier.rs
src/query/stream_schema_provider.rs

src/hottier.rs

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/hottier.rs (1)

494-500: ⚠️ Potential issue | 🟡 Minor

Avoid panic on malformed date directories.

Line 500 uses unwrap() during date parsing. A single unexpected directory name will panic this flow; skip invalid entries instead of aborting.

💡 Suggested fix

-            let date = NaiveDate::parse_from_str(
-                date.file_name()
-                    .to_string_lossy()
-                    .trim_start_matches("date="),
-                "%Y-%m-%d",
-            )
-            .unwrap();
-            date_list.push(date);
+            if let Ok(parsed_date) = NaiveDate::parse_from_str(
+                date.file_name()
+                    .to_string_lossy()
+                    .trim_start_matches("date="),
+                "%Y-%m-%d",
+            ) {
+                date_list.push(parsed_date);
+            } else {
+                warn!("Skipping invalid hot-tier date directory: {:?}", date.file_name());
+            }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/hottier.rs` around lines 494 - 500, The code currently calls
NaiveDate::parse_from_str(...).unwrap() on values derived from date.file_name()
which will panic on malformed directory names; change this to attempt parsing
and skip invalid entries instead: replace the unwrap with handling like if let
Ok(parsed_date) = NaiveDate::parse_from_str(... ) { /* use parsed_date */ } else
{ continue; } (or log the invalid name and continue) so the loop does not abort
on a single bad directory. Ensure you use the same expressions
date.file_name().to_string_lossy().trim_start_matches(...) when attempting to
parse so the branch matches the original behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/hottier.rs`:
- Around line 494-500: The code currently calls
NaiveDate::parse_from_str(...).unwrap() on values derived from date.file_name()
which will panic on malformed directory names; change this to attempt parsing
and skip invalid entries instead: replace the unwrap with handling like if let
Ok(parsed_date) = NaiveDate::parse_from_str(... ) { /* use parsed_date */ } else
{ continue; } (or log the invalid name and continue) so the loop does not abort
on a single bad directory. Ensure you use the same expressions
date.file_name().to_string_lossy().trim_start_matches(...) when attempting to
parse so the branch matches the original behavior.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e31e8b and a5083e5.

📒 Files selected for processing (1)

src/hottier.rs

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

coderabbitai bot previously approved these changes Mar 2, 2026

View reviewed changes

fix: proper listing of hottier

8e31e8b

parmesant dismissed coderabbitai[bot]’s stale review via 8e31e8b March 2, 2026 12:52

parmesant force-pushed the hottier-bugfix branch from 1545af7 to 8e31e8b Compare March 2, 2026 12:52

coderabbitai bot requested changes Mar 2, 2026

View reviewed changes

src/hottier.rs Outdated Show resolved Hide resolved

coderabbit fix

a5083e5

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

coderabbitai bot approved these changes Mar 2, 2026

View reviewed changes

nikhilsinhaparseable approved these changes Mar 2, 2026

View reviewed changes

nikhilsinhaparseable merged commit 08073d6 into parseablehq:main Mar 2, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: proper listing of hottier#1565

fix: proper listing of hottier#1565
nikhilsinhaparseable merged 2 commits intoparseablehq:mainfrom
parmesant:hottier-bugfix

parmesant commented Mar 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 2, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

parmesant commented Mar 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

parmesant commented Mar 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 2, 2026 •

edited

Loading