fix: proper listing of hottier#1565
Conversation
WalkthroughAdded tenant-scoped behavior to hot-tier operations by threading an optional Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/hottier.rs (1)
471-506:⚠️ Potential issue | 🟡 MinorPotential panic on malformed directory name.
Line 500 uses
unwrap()onNaiveDate::parse_from_str. If a directory exists that doesn't follow thedate=YYYY-MM-DDnaming convention (e.g., a manually created folder or corrupted state), this will panic.Also, line 482 contains dead commented code that should be removed.
Proposed fix to handle parse errors gracefully
- // let path = self.hot_tier_path.join(stream); if !path.exists() { return Ok(date_list); } let directories = fs::read_dir(&path).await?; let mut dates = ReadDirStream::new(directories); while let Some(date) = dates.next().await { let date = date?; if !date.path().is_dir() { continue; } - let date = NaiveDate::parse_from_str( + let Ok(date) = NaiveDate::parse_from_str( date.file_name() .to_string_lossy() .trim_start_matches("date="), "%Y-%m-%d", - ) - .unwrap(); - date_list.push(date); + ) else { + continue; + }; + date_list.push(date); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hottier.rs` around lines 471 - 506, fetch_hot_tier_dates currently calls NaiveDate::parse_from_str(...).unwrap(), which can panic if a directory name doesn't match "date=YYYY-MM-DD"; replace the unwrap by handling the parse Result (e.g., skip invalid entries with a warning via the function's error type or log and continue) so malformed directories do not crash the function, and push only successfully parsed NaiveDate values into date_list; also remove the dead commented line that joins hot_tier_path with stream (the commented-out let path = ... at the start of the function) to clean up the code.
🧹 Nitpick comments (1)
src/query/stream_schema_provider.rs (1)
281-286: Consider removing commented-out code.The commented block (lines 281-285) describes a previous approach to tenant-based URL construction that is no longer used. Since LocalFS is single-tenant only and tenant_id is not used for path construction, this dead code could be removed to improve readability.
Proposed cleanup
- // // NOTE: There is the possibility of a parquet file being pushed to object store - // // and deleted from staging in the time it takes for datafusion to get to it. - // // Staging parquet execution plan - // let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() { - // &format!("file://{tenant_id}/") - // } else { - // "file:///" - // }; + // NOTE: There is the possibility of a parquet file being pushed to object store + // and deleted from staging in the time it takes for datafusion to get to it. let object_store_url = "file:///";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/stream_schema_provider.rs` around lines 281 - 286, Remove the dead commented-out tenant-based URL construction in stream_schema_provider.rs: delete the commented block that references self.tenant_id and the formatted "file://{tenant_id}/" lines, leaving the existing let object_store_url = "file:///"; declaration; this cleans up unused code around object_store_url and clarifies that LocalFS is single-tenant and tenant_id is not used for path construction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/hottier.rs`:
- Around line 471-506: fetch_hot_tier_dates currently calls
NaiveDate::parse_from_str(...).unwrap(), which can panic if a directory name
doesn't match "date=YYYY-MM-DD"; replace the unwrap by handling the parse Result
(e.g., skip invalid entries with a warning via the function's error type or log
and continue) so malformed directories do not crash the function, and push only
successfully parsed NaiveDate values into date_list; also remove the dead
commented line that joins hot_tier_path with stream (the commented-out let path
= ... at the start of the function) to clean up the code.
---
Nitpick comments:
In `@src/query/stream_schema_provider.rs`:
- Around line 281-286: Remove the dead commented-out tenant-based URL
construction in stream_schema_provider.rs: delete the commented block that
references self.tenant_id and the formatted "file://{tenant_id}/" lines, leaving
the existing let object_store_url = "file:///"; declaration; this cleans up
unused code around object_store_url and clarifies that LocalFS is single-tenant
and tenant_id is not used for path construction.
1545af7 to
8e31e8b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/query/stream_schema_provider.rs (1)
209-219:⚠️ Potential issue | 🟡 MinorAvoid
unwrap()when converting hot-tier file paths during plan construction.
path.to_str().unwrap()can panic on non-UTF8 paths. Convert the closure to a loop to enable proper error propagation withDataFusionError.🛠️ Proposed fix
- let hot_tier_files: Vec<File> = hot_tier_files - .into_iter() - .map(|mut file| { - let path = PARSEABLE - .options - .hot_tier_storage_path - .as_ref() - .unwrap() - .join(&file.file_path); - file.file_path = path.to_str().unwrap().to_string(); - file - }) - .collect(); + let mut resolved_hot_tier_files: Vec<File> = Vec::with_capacity(hot_tier_files.len()); + for mut file in hot_tier_files { + let path = PARSEABLE + .options + .hot_tier_storage_path + .as_ref() + .unwrap() + .join(&file.file_path); + let path_str = path.to_str().ok_or_else(|| { + DataFusionError::Execution(format!( + "hot tier path is not valid UTF-8: {}", + path.display() + )) + })?; + file.file_path = path_str.to_string(); + resolved_hot_tier_files.push(file); + } + let hot_tier_files = resolved_hot_tier_files;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/query/stream_schema_provider.rs` around lines 209 - 219, The code converts hot_tier_files inside a map closure and calls path.to_str().unwrap(), which can panic on non-UTF8 paths; replace the map closure with an explicit loop over hot_tier_files so you can return a Result and propagate errors using DataFusionError (e.g., DataFusionError::Execution or a suitable variant) when path.to_str() returns None or when hot_tier_storage_path is missing; update the transformation of each File by joining PARSEABLE.options.hot_tier_storage_path.as_ref() with file.file_path, attempt to convert the joined Path to a UTF-8 string without unwrap, and return/propagate a DataFusionError on failure instead of panicking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/hottier.rs`:
- Around line 558-573: The current manifest_files.retain closure uses blocking
Path::exists and std::fs::metadata which runs inside an async query path;
replace that synchronous retain with an async-aware loop: iterate over
manifest_files (e.g., with a for loop), construct hot_tier_path for each file,
call tokio::fs::metadata(&hot_tier_path).await (handle Ok/Err), compare
meta.len() to file.file_size, push matching entries into hot_tier_files and skip
adding them to a new manifest_files vector, otherwise push them into the new
vector; finally replace manifest_files with the new vector. Update the code
around the manifest_files.retain closure in hottier.rs to use this async
iteration so Path::exists and std::fs::metadata are not used on the Tokio
runtime.
---
Outside diff comments:
In `@src/query/stream_schema_provider.rs`:
- Around line 209-219: The code converts hot_tier_files inside a map closure and
calls path.to_str().unwrap(), which can panic on non-UTF8 paths; replace the map
closure with an explicit loop over hot_tier_files so you can return a Result and
propagate errors using DataFusionError (e.g., DataFusionError::Execution or a
suitable variant) when path.to_str() returns None or when hot_tier_storage_path
is missing; update the transformation of each File by joining
PARSEABLE.options.hot_tier_storage_path.as_ref() with file.file_path, attempt to
convert the joined Path to a UTF-8 string without unwrap, and return/propagate a
DataFusionError on failure instead of panicking.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/hottier.rs (1)
494-500:⚠️ Potential issue | 🟡 MinorAvoid panic on malformed date directories.
Line 500 uses
unwrap()during date parsing. A single unexpected directory name will panic this flow; skip invalid entries instead of aborting.💡 Suggested fix
- let date = NaiveDate::parse_from_str( - date.file_name() - .to_string_lossy() - .trim_start_matches("date="), - "%Y-%m-%d", - ) - .unwrap(); - date_list.push(date); + if let Ok(parsed_date) = NaiveDate::parse_from_str( + date.file_name() + .to_string_lossy() + .trim_start_matches("date="), + "%Y-%m-%d", + ) { + date_list.push(parsed_date); + } else { + warn!("Skipping invalid hot-tier date directory: {:?}", date.file_name()); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hottier.rs` around lines 494 - 500, The code currently calls NaiveDate::parse_from_str(...).unwrap() on values derived from date.file_name() which will panic on malformed directory names; change this to attempt parsing and skip invalid entries instead: replace the unwrap with handling like if let Ok(parsed_date) = NaiveDate::parse_from_str(... ) { /* use parsed_date */ } else { continue; } (or log the invalid name and continue) so the loop does not abort on a single bad directory. Ensure you use the same expressions date.file_name().to_string_lossy().trim_start_matches(...) when attempting to parse so the branch matches the original behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/hottier.rs`:
- Around line 494-500: The code currently calls
NaiveDate::parse_from_str(...).unwrap() on values derived from date.file_name()
which will panic on malformed directory names; change this to attempt parsing
and skip invalid entries instead: replace the unwrap with handling like if let
Ok(parsed_date) = NaiveDate::parse_from_str(... ) { /* use parsed_date */ } else
{ continue; } (or log the invalid name and continue) so the loop does not abort
on a single bad directory. Ensure you use the same expressions
date.file_name().to_string_lossy().trim_start_matches(...) when attempting to
parse so the branch matches the original behavior.
Fixes #XXXX.
Description
This PR has:
Summary by CodeRabbit
New Features
Refactor
Bug Fixes