Skip to content

fix: proper listing of hottier#1565

Merged
nikhilsinhaparseable merged 2 commits intoparseablehq:mainfrom
parmesant:hottier-bugfix
Mar 2, 2026
Merged

fix: proper listing of hottier#1565
nikhilsinhaparseable merged 2 commits intoparseablehq:mainfrom
parmesant:hottier-bugfix

Conversation

@parmesant
Copy link
Contributor

@parmesant parmesant commented Mar 2, 2026

Fixes #XXXX.

Description


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

  • New Features

    • Hot-tier operations now support tenant-scoped paths, enabling per-tenant hot data access and isolation.
  • Refactor

    • Improved hot-tier manifest retrieval and path resolution for more consistent, reliable lookups.
    • Streamlined date and file discovery to consistently respect tenant context.
  • Bug Fixes

    • More robust filtering of manifest/parquet files to avoid missing or incorrect entries.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

Walkthrough

Added tenant-scoped behavior to hot-tier operations by threading an optional tenant_id through multiple path/date helpers and manifest functions; refactored get_hot_tier_manifest_files to remove the stream parameter and adjusted call sites accordingly.

Changes

Cohort / File(s) Summary
Multi-tenant path and date plumbing
src/hottier.rs
Threaded tenant_id: &Option<String> through public and private helpers (e.g., get_stream_path_for_date, fetch_hot_tier_dates, get_oldest_date_time_entry, get_hot_tier_parquet_files, public APIs). Path resolution and date lookups now consider optional tenant scope; call sites updated to pass tenant_id.
Manifest retrieval and filtering
src/hottier.rs
Refactored get_hot_tier_manifest_files (removed stream param) and updated manifest/parquet selection & existence checks to use tenant-aware paths and local file checks; adjusted manifest write paths to include tenant context.
Call-site and plan adjustments
src/query/stream_schema_provider.rs
Updated call to new get_hot_tier_manifest_files signature, added explicit Vec<File> annotation for hot_tier_files, and simplified object store URL selection to "file:///" (removed tenant conditional).

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • nikhilsinhaparseable

Poem

🐇
I hopped through folders, small and wide,
Tenant names tucked safely inside,
Manifests sorted by date and name,
Streams now find their scoped home game,
A little thump — hot-tier's tad refined! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description contains only template placeholders with no actual content filled in. The issue reference, goals, solutions, key changes, and testing status are all missing or unchecked. Fill in the description with the actual issue being fixed, explain the goals and rationale for tenant-scoping changes, summarize key changes, and check off or explain the testing and documentation status.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: proper listing of hottier' is concise and clearly relates to the main change—fixing hot tier file listing behavior by introducing tenant-scoped operations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/hottier.rs (1)

471-506: ⚠️ Potential issue | 🟡 Minor

Potential panic on malformed directory name.

Line 500 uses unwrap() on NaiveDate::parse_from_str. If a directory exists that doesn't follow the date=YYYY-MM-DD naming convention (e.g., a manually created folder or corrupted state), this will panic.

Also, line 482 contains dead commented code that should be removed.

Proposed fix to handle parse errors gracefully
-            // let path = self.hot_tier_path.join(stream);
         if !path.exists() {
             return Ok(date_list);
         }
 
         let directories = fs::read_dir(&path).await?;
         let mut dates = ReadDirStream::new(directories);
         while let Some(date) = dates.next().await {
             let date = date?;
             if !date.path().is_dir() {
                 continue;
             }
-            let date = NaiveDate::parse_from_str(
+            let Ok(date) = NaiveDate::parse_from_str(
                 date.file_name()
                     .to_string_lossy()
                     .trim_start_matches("date="),
                 "%Y-%m-%d",
-            )
-            .unwrap();
-            date_list.push(date);
+            ) else {
+                continue;
+            };
+            date_list.push(date);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hottier.rs` around lines 471 - 506, fetch_hot_tier_dates currently calls
NaiveDate::parse_from_str(...).unwrap(), which can panic if a directory name
doesn't match "date=YYYY-MM-DD"; replace the unwrap by handling the parse Result
(e.g., skip invalid entries with a warning via the function's error type or log
and continue) so malformed directories do not crash the function, and push only
successfully parsed NaiveDate values into date_list; also remove the dead
commented line that joins hot_tier_path with stream (the commented-out let path
= ... at the start of the function) to clean up the code.
🧹 Nitpick comments (1)
src/query/stream_schema_provider.rs (1)

281-286: Consider removing commented-out code.

The commented block (lines 281-285) describes a previous approach to tenant-based URL construction that is no longer used. Since LocalFS is single-tenant only and tenant_id is not used for path construction, this dead code could be removed to improve readability.

Proposed cleanup
-        // // NOTE: There is the possibility of a parquet file being pushed to object store
-        // // and deleted from staging in the time it takes for datafusion to get to it.
-        // // Staging parquet execution plan
-        // let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() {
-        //     &format!("file://{tenant_id}/")
-        // } else {
-        //     "file:///"
-        // };
+        // NOTE: There is the possibility of a parquet file being pushed to object store
+        // and deleted from staging in the time it takes for datafusion to get to it.
         let object_store_url = "file:///";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/query/stream_schema_provider.rs` around lines 281 - 286, Remove the dead
commented-out tenant-based URL construction in stream_schema_provider.rs: delete
the commented block that references self.tenant_id and the formatted
"file://{tenant_id}/" lines, leaving the existing let object_store_url =
"file:///"; declaration; this cleans up unused code around object_store_url and
clarifies that LocalFS is single-tenant and tenant_id is not used for path
construction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/hottier.rs`:
- Around line 471-506: fetch_hot_tier_dates currently calls
NaiveDate::parse_from_str(...).unwrap(), which can panic if a directory name
doesn't match "date=YYYY-MM-DD"; replace the unwrap by handling the parse Result
(e.g., skip invalid entries with a warning via the function's error type or log
and continue) so malformed directories do not crash the function, and push only
successfully parsed NaiveDate values into date_list; also remove the dead
commented line that joins hot_tier_path with stream (the commented-out let path
= ... at the start of the function) to clean up the code.

---

Nitpick comments:
In `@src/query/stream_schema_provider.rs`:
- Around line 281-286: Remove the dead commented-out tenant-based URL
construction in stream_schema_provider.rs: delete the commented block that
references self.tenant_id and the formatted "file://{tenant_id}/" lines, leaving
the existing let object_store_url = "file:///"; declaration; this cleans up
unused code around object_store_url and clarifies that LocalFS is single-tenant
and tenant_id is not used for path construction.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7b01e3 and 1545af7.

📒 Files selected for processing (2)
  • src/hottier.rs
  • src/query/stream_schema_provider.rs

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 2, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/query/stream_schema_provider.rs (1)

209-219: ⚠️ Potential issue | 🟡 Minor

Avoid unwrap() when converting hot-tier file paths during plan construction.

path.to_str().unwrap() can panic on non-UTF8 paths. Convert the closure to a loop to enable proper error propagation with DataFusionError.

🛠️ Proposed fix
-        let hot_tier_files: Vec<File> = hot_tier_files
-            .into_iter()
-            .map(|mut file| {
-                let path = PARSEABLE
-                    .options
-                    .hot_tier_storage_path
-                    .as_ref()
-                    .unwrap()
-                    .join(&file.file_path);
-                file.file_path = path.to_str().unwrap().to_string();
-                file
-            })
-            .collect();
+        let mut resolved_hot_tier_files: Vec<File> = Vec::with_capacity(hot_tier_files.len());
+        for mut file in hot_tier_files {
+            let path = PARSEABLE
+                .options
+                .hot_tier_storage_path
+                .as_ref()
+                .unwrap()
+                .join(&file.file_path);
+            let path_str = path.to_str().ok_or_else(|| {
+                DataFusionError::Execution(format!(
+                    "hot tier path is not valid UTF-8: {}",
+                    path.display()
+                ))
+            })?;
+            file.file_path = path_str.to_string();
+            resolved_hot_tier_files.push(file);
+        }
+        let hot_tier_files = resolved_hot_tier_files;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/query/stream_schema_provider.rs` around lines 209 - 219, The code
converts hot_tier_files inside a map closure and calls path.to_str().unwrap(),
which can panic on non-UTF8 paths; replace the map closure with an explicit loop
over hot_tier_files so you can return a Result and propagate errors using
DataFusionError (e.g., DataFusionError::Execution or a suitable variant) when
path.to_str() returns None or when hot_tier_storage_path is missing; update the
transformation of each File by joining
PARSEABLE.options.hot_tier_storage_path.as_ref() with file.file_path, attempt to
convert the joined Path to a UTF-8 string without unwrap, and return/propagate a
DataFusionError on failure instead of panicking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/hottier.rs`:
- Around line 558-573: The current manifest_files.retain closure uses blocking
Path::exists and std::fs::metadata which runs inside an async query path;
replace that synchronous retain with an async-aware loop: iterate over
manifest_files (e.g., with a for loop), construct hot_tier_path for each file,
call tokio::fs::metadata(&hot_tier_path).await (handle Ok/Err), compare
meta.len() to file.file_size, push matching entries into hot_tier_files and skip
adding them to a new manifest_files vector, otherwise push them into the new
vector; finally replace manifest_files with the new vector. Update the code
around the manifest_files.retain closure in hottier.rs to use this async
iteration so Path::exists and std::fs::metadata are not used on the Tokio
runtime.

---

Outside diff comments:
In `@src/query/stream_schema_provider.rs`:
- Around line 209-219: The code converts hot_tier_files inside a map closure and
calls path.to_str().unwrap(), which can panic on non-UTF8 paths; replace the map
closure with an explicit loop over hot_tier_files so you can return a Result and
propagate errors using DataFusionError (e.g., DataFusionError::Execution or a
suitable variant) when path.to_str() returns None or when hot_tier_storage_path
is missing; update the transformation of each File by joining
PARSEABLE.options.hot_tier_storage_path.as_ref() with file.file_path, attempt to
convert the joined Path to a UTF-8 string without unwrap, and return/propagate a
DataFusionError on failure instead of panicking.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1545af7 and 8e31e8b.

📒 Files selected for processing (2)
  • src/hottier.rs
  • src/query/stream_schema_provider.rs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/hottier.rs (1)

494-500: ⚠️ Potential issue | 🟡 Minor

Avoid panic on malformed date directories.

Line 500 uses unwrap() during date parsing. A single unexpected directory name will panic this flow; skip invalid entries instead of aborting.

💡 Suggested fix
-            let date = NaiveDate::parse_from_str(
-                date.file_name()
-                    .to_string_lossy()
-                    .trim_start_matches("date="),
-                "%Y-%m-%d",
-            )
-            .unwrap();
-            date_list.push(date);
+            if let Ok(parsed_date) = NaiveDate::parse_from_str(
+                date.file_name()
+                    .to_string_lossy()
+                    .trim_start_matches("date="),
+                "%Y-%m-%d",
+            ) {
+                date_list.push(parsed_date);
+            } else {
+                warn!("Skipping invalid hot-tier date directory: {:?}", date.file_name());
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/hottier.rs` around lines 494 - 500, The code currently calls
NaiveDate::parse_from_str(...).unwrap() on values derived from date.file_name()
which will panic on malformed directory names; change this to attempt parsing
and skip invalid entries instead: replace the unwrap with handling like if let
Ok(parsed_date) = NaiveDate::parse_from_str(... ) { /* use parsed_date */ } else
{ continue; } (or log the invalid name and continue) so the loop does not abort
on a single bad directory. Ensure you use the same expressions
date.file_name().to_string_lossy().trim_start_matches(...) when attempting to
parse so the branch matches the original behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/hottier.rs`:
- Around line 494-500: The code currently calls
NaiveDate::parse_from_str(...).unwrap() on values derived from date.file_name()
which will panic on malformed directory names; change this to attempt parsing
and skip invalid entries instead: replace the unwrap with handling like if let
Ok(parsed_date) = NaiveDate::parse_from_str(... ) { /* use parsed_date */ } else
{ continue; } (or log the invalid name and continue) so the loop does not abort
on a single bad directory. Ensure you use the same expressions
date.file_name().to_string_lossy().trim_start_matches(...) when attempting to
parse so the branch matches the original behavior.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e31e8b and a5083e5.

📒 Files selected for processing (1)
  • src/hottier.rs

@nikhilsinhaparseable nikhilsinhaparseable merged commit 08073d6 into parseablehq:main Mar 2, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants