LCORE-1539: Add agentic hermeto simulation to the rag-content repo#105
LCORE-1539: Add agentic hermeto simulation to the rag-content repo#105syedriko wants to merge 1 commit intolightspeed-core:mainfrom
Conversation
WalkthroughAdds Hermeto-driven hermetic dependency prefetching and local hermetic build simulation: new Hermeto prefetch manifests, Makefile targets and .gitignore entries, a detailed AGENTS.md operational guide, and scripts to run Hermeto, stage outputs for Konflux/cachi2, generate simulated Containerfiles, and simulate hermetic builds for CPU/CUDA. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Make as Makefile
participant Runner as run_hermeto_fetch_deps.sh
participant ContainerEngine
participant Hermeto as Hermeto Container
participant Out as HERMETO_OUT
participant Stager as stage_hermetic_build_context.sh
participant Staging as .hermetic-staging
participant Gen as gen_containerfile_hermetic_sim.sh
participant Simulator as simulate_hermetic_build.sh
User->>Make: make hermeto-verify-pip-cpu
Make->>Runner: ./scripts/run_hermeto_fetch_deps.sh pip-cpu
Runner->>ContainerEngine: start container with hermeto image and manifest
ContainerEngine->>Hermeto: exec fetch-deps (manifest)
Hermeto->>Out: write .build-config.json and deps/
Runner->>User: exit (success/failure)
User->>Stager: ./scripts/stage_hermetic_build_context.sh --hermeto-out $HERMETO_OUT
Stager->>Out: read .build-config.json and deps/
Stager->>Staging: create cachi2/output and patched-requirements
Stager->>User: print staged filenames
User->>Simulator: ./scripts/simulate_hermetic_build.sh cpu
Simulator->>Stager: stage_hermetic_build_context.sh
Simulator->>Gen: gen_containerfile_hermetic_sim.sh cpu -> Containerfile.sim
Simulator->>ContainerEngine: build image using Containerfile.sim and .hermetic-staging
ContainerEngine->>Simulator: produce rag-content-hermetic-sim:cpu
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (2)
Makefile (1)
109-120: Declarehermeto-fetch-depsas phony.
hermeto-fetch-depsis a command target but not in.PHONY, so a same-named file can shadow it.♻️ Proposed fix
-.PHONY: hermeto-verify-pip-cpu hermeto-verify-pip-cuda hermeto-verify-pip +.PHONY: hermeto-fetch-deps hermeto-verify-pip-cpu hermeto-verify-pip-cuda hermeto-verify-pip🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 109 - 120, The Makefile target hermeto-fetch-deps is not listed as phony and can be shadowed by a file of the same name; add hermeto-fetch-deps to the .PHONY declaration (alongside hermeto-verify-pip-cpu hermeto-verify-pip-cuda hermeto-verify-pip) so the Makefile always treats the hermeto-fetch-deps target as a command rather than a file. Ensure the .PHONY line includes the exact token hermeto-fetch-deps.scripts/run_hermeto_fetch_deps.sh (1)
21-21: Use an immutable digest instead of:latestfor Hermeto image pinning.Line 21 currently uses
:latest, which is mutable and makes builds non-reproducible. Docker best practices recommend pinning to an immutable digest for deterministic CI and local behavior. Replace:latestwith a specific version tag and its corresponding digest (e.g.,ghcr.io/hermetoproject/hermeto:v0.47.0@sha256:<digest>). Tagged container versions are available at https://github.com/hermetoproject/hermeto/pkgs/container/hermeto/versions.♻️ Proposed fix
-HERMETO_IMAGE="${HERMETO_IMAGE:-ghcr.io/hermetoproject/hermeto:latest}" +HERMETO_IMAGE="${HERMETO_IMAGE:-ghcr.io/hermetoproject/hermeto:v0.47.0@sha256:<digest>}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/run_hermeto_fetch_deps.sh` at line 21, Replace the mutable :latest image tag in the HERMETO_IMAGE default with an immutable tag+digest; update the HERMETO_IMAGE="${HERMETO_IMAGE:-ghcr.io/hermetoproject/hermeto:latest}" default to use a specific release and its sha256 digest (for example ghcr.io/hermetoproject/hermeto:v0.47.0@sha256:...) so builds are reproducible; ensure you use the chosen tag+digest from the Hermeto package versions and leave the HERMETO_IMAGE variable override behavior intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Line 108: Update the validation step to fix the command typo: change the
second command from "hermeto-verify-pip-cuda" to include the missing "make"
prefix so it reads "make hermeto-verify-pip-cuda", keeping the first command
"make hermeto-verify-pip-cpu" and the HERMETO_OUT mention unchanged; ensure both
commands (make hermeto-verify-pip-cpu / make hermeto-verify-pip-cuda) are
consistent and copy-paste runnable in the AGENTS.md validation step.
In `@scripts/gen_containerfile_hermetic_sim.sh`:
- Around line 24-42: The script silently proceeds when the injection anchors
aren't present and references the wrong GPU Containerfile name; update the code
that sets/uses the base filename (replace Containerfile-cuda with
Containerfile-gpu or make it conditional) and add explicit existence/anchor
checks before running the awk pipeline: verify the base file exists, grep for
the "USER root" anchor and the "COPY LICENSE /licenses/LICENSE" anchor in the
selected Containerfile and exit non‑zero with a clear error if either is missing
so the script fails fast when anchors are not found.
In `@scripts/hermeto/prefetch-konflux-cuda.json`:
- Around line 8-10: In the CUDA manifest block replace the incorrect repo path
"cuda" with the repo root "." (matching the CPU manifest), change the
non-existent requirements-build.cuda.txt reference to requirements-build.txt,
and add the missing include_summary_in_sbom: true field (same placement as in
the CPU manifest) so the prefetch can find files and include the SBOM summary.
In `@scripts/hermeto/prefetch-pip-cuda.json`:
- Line 13: Update the JSON prefetch configs that set the
"requirements_build_files" array to reference the missing file: replace
"requirements-build.cuda.txt" with the existing "requirements-build.txt" in the
two prefetch config files that currently list that value under the
"requirements_build_files" key so CUDA prefetch uses the correct requirements
file.
In `@scripts/simulate_hermetic_build.sh`:
- Around line 33-37: The script simulate_hermetic_build.sh currently only
validates a single positional argument into kind ("cpu" or "cuda") and never
accepts or forwards the documented --model option; update argument parsing in
simulate_hermetic_build.sh to accept an optional --model <path> (or a second
positional model arg), set a MODEL variable when provided, validate kind as
before, and pass that MODEL (if set) to the downstream invocation of
scripts/stage_hermetic_build_context.sh so the staged build context receives the
model path; look for the kind variable, the initial [[ "${1:-}" == cpu || ... ]]
check, and the call to scripts/stage_hermetic_build_context.sh to add parsing
and forwarding logic.
- Around line 39-43: The current inline Python preflight (the python3 -c call
that reads p=pathlib.Path('$out')/'.build-config.json' and json.loads(...))
treats missing or malformed .build-config.json the same as a CPU prefetch and
emits the misleading "looks like CPU prefetch" message; change the check to
explicitly detect file existence and JSON parse errors before testing for 'cuda'
in project file names: first verify ${out}/.build-config.json exists, then try
to parse it and on parse failure emit a clear error mentioning the
invalid/malformed config, and only if parsing succeeds run the current
name-check and keep the CPU-prefetch message for the case where no 'cuda' file
names are found. Ensure you update the echo/exit branches so missing file vs
malformed JSON vs CPU-prefetch are distinct messages and still exit with nonzero
on error.
In `@scripts/stage_hermetic_build_context.sh`:
- Around line 24-31: Guard access to the next positional parameter before
assigning HERMETO_OUT and MODEL_SRC to avoid failing under set -u: in the option
parsing block that handles --hermeto-out and --model, check that "$2" is
non-empty (or that there is another argument) and if missing call the script's
usage/error function and exit, otherwise assign HERMETO_OUT="$2" (for
--hermeto-out) or MODEL_SRC="$2" (for --model") and shift 2; reference the
option labels (--hermeto-out, --model) and the variables HERMETO_OUT and
MODEL_SRC when making the change.
---
Nitpick comments:
In `@Makefile`:
- Around line 109-120: The Makefile target hermeto-fetch-deps is not listed as
phony and can be shadowed by a file of the same name; add hermeto-fetch-deps to
the .PHONY declaration (alongside hermeto-verify-pip-cpu hermeto-verify-pip-cuda
hermeto-verify-pip) so the Makefile always treats the hermeto-fetch-deps target
as a command rather than a file. Ensure the .PHONY line includes the exact token
hermeto-fetch-deps.
In `@scripts/run_hermeto_fetch_deps.sh`:
- Line 21: Replace the mutable :latest image tag in the HERMETO_IMAGE default
with an immutable tag+digest; update the
HERMETO_IMAGE="${HERMETO_IMAGE:-ghcr.io/hermetoproject/hermeto:latest}" default
to use a specific release and its sha256 digest (for example
ghcr.io/hermetoproject/hermeto:v0.47.0@sha256:...) so builds are reproducible;
ensure you use the chosen tag+digest from the Hermeto package versions and leave
the HERMETO_IMAGE variable override behavior intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: da50c2d0-b0f9-4420-8405-a3a4b6da1635
📒 Files selected for processing (11)
.gitignoreAGENTS.mdMakefilescripts/gen_containerfile_hermetic_sim.shscripts/hermeto/prefetch-konflux-cpu.jsonscripts/hermeto/prefetch-konflux-cuda.jsonscripts/hermeto/prefetch-pip-cpu.jsonscripts/hermeto/prefetch-pip-cuda.jsonscripts/run_hermeto_fetch_deps.shscripts/simulate_hermetic_build.shscripts/stage_hermetic_build_context.sh
| awk -v staging="$STAGING_REL" ' | ||
| /^USER root$/ { | ||
| print "" | ||
| print "# Injected by gen_containerfile_hermetic_sim.sh (local hermetic simulation)" | ||
| print "COPY " staging "/cachi2 /cachi2" | ||
| next | ||
| } | ||
| { print } | ||
| ' "$base" | awk -v staging="$STAGING_REL" ' | ||
| $0 ~ /^COPY LICENSE \/licenses\/LICENSE$/ { | ||
| print "" | ||
| print "# Hermeto project_files (file:// wheel paths) overlay" | ||
| print "COPY " staging "/patched-requirements/ ./" | ||
| next | ||
| } | ||
| { print } | ||
| ' |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/usr/bin/env bash
# Verify anchor availability in base containerfiles used by simulation.
set -euo pipefail
for f in Containerfile Containerfile-cuda; do
echo "== $f =="
rg -n '^USER root$|^COPY LICENSE /licenses/LICENSE$' "$f" || true
doneRepository: lightspeed-core/rag-content
Length of output: 237
🏁 Script executed:
# First, let's read the gen_containerfile_hermetic_sim.sh script to understand the full context
cat -n scripts/gen_containerfile_hermetic_sim.sh | head -60Repository: lightspeed-core/rag-content
Length of output: 1416
🏁 Script executed:
# Find all Containerfile variants in the repository
find . -name 'Containerfile*' -type fRepository: lightspeed-core/rag-content
Length of output: 106
🏁 Script executed:
# Check if there are any other base containerfiles or references
rg -l 'Containerfile|containerfile' --type sh --type yaml --type yml | head -20Repository: lightspeed-core/rag-content
Length of output: 98
🏁 Script executed:
# Show the exact lines with the anchors in Containerfile
rg -n -A 2 -B 2 '^USER root$|^COPY LICENSE /licenses/LICENSE$' ContainerfileRepository: lightspeed-core/rag-content
Length of output: 273
🏁 Script executed:
# Also check Containerfile-gpu to see if it's the cuda variant
head -30 Containerfile-gpuRepository: lightspeed-core/rag-content
Length of output: 974
🏁 Script executed:
# Check git log for Containerfile changes
git log --oneline --all -- '*Containerfile*' | head -20Repository: lightspeed-core/rag-content
Length of output: 1025
🏁 Script executed:
# Check if there's documentation about the naming
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "README*" \) -exec rg -l 'Containerfile' {} \;Repository: lightspeed-core/rag-content
Length of output: 94
🏁 Script executed:
# Check the recent commit that added the gen_containerfile_hermetic_sim.sh
git show 729494d --stat | head -40Repository: lightspeed-core/rag-content
Length of output: 2368
🏁 Script executed:
# Check the README for any notes about the GPU/CUDA variant
grep -A 5 -B 5 -i 'cuda\|gpu' README.mdRepository: lightspeed-core/rag-content
Length of output: 654
Add validation to fail fast when injection anchors are missing.
The awk pipeline silently succeeds even if the anchor patterns (lines 25 and 34) are never matched, potentially producing non-hermetic Containerfiles without warning.
Additionally, there's a filename bug on line 21: the script references Containerfile-cuda but the actual file is Containerfile-gpu. The GPU variant is also missing both required anchors (USER root and COPY LICENSE /licenses/LICENSE), which would need to be added before this works.
The proposed grep validation checks address the fail-fast concern:
Proposed additions
+grep -q '^USER root$' "$base" || {
+ echo "error: missing anchor 'USER root' in $base" >&2
+ exit 1
+}
+grep -q '^COPY LICENSE /licenses/LICENSE$' "$base" || {
+ echo "error: missing anchor 'COPY LICENSE /licenses/LICENSE' in $base" >&2
+ exit 1
+}
+
awk -v staging="$STAGING_REL" '🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/gen_containerfile_hermetic_sim.sh` around lines 24 - 42, The script
silently proceeds when the injection anchors aren't present and references the
wrong GPU Containerfile name; update the code that sets/uses the base filename
(replace Containerfile-cuda with Containerfile-gpu or make it conditional) and
add explicit existence/anchor checks before running the awk pipeline: verify the
base file exists, grep for the "USER root" anchor and the "COPY LICENSE
/licenses/LICENSE" anchor in the selected Containerfile and exit non‑zero with a
clear error if either is missing so the script fails fast when anchors are not
found.
| "type": "rpm", | ||
| "path": "cuda" | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/usr/bin/env bash
# Verify required paths/files for CUDA full manifest.
set -euo pipefail
echo "== Check cuda directory =="
fd -HI '^cuda$' -td
echo "== Check CUDA build requirements file =="
fd -HI '^requirements-build\.cuda\.txt$'
echo "== Compare CPU vs CUDA konflux manifest fields =="
rg -n '"type": "rpm"|"path": ".*"|"requirements_build_files"|include_summary_in_sbom' \
scripts/hermeto/prefetch-konflux-cpu.json \
scripts/hermeto/prefetch-konflux-cuda.jsonRepository: lightspeed-core/rag-content
Length of output: 985
CUDA manifest has blocking path and file mismatches that will cause prefetch to fail.
Line 9 uses path: "cuda" (which doesn't exist) and Line 22 uses requirements-build.cuda.txt (which doesn't exist). The CPU manifest pattern (Lines 9 and 23) correctly uses repo root "." with requirements-build.txt. Additionally, the CUDA manifest is missing the include_summary_in_sbom: true field present in the CPU manifest.
Required fix
{
"type": "rpm",
- "path": "cuda"
+ "path": ".",
+ "include_summary_in_sbom": true
},
@@
- "requirements_build_files": ["requirements-build.cuda.txt"],
+ "requirements_build_files": ["requirements-build.txt"],🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/hermeto/prefetch-konflux-cuda.json` around lines 8 - 10, In the CUDA
manifest block replace the incorrect repo path "cuda" with the repo root "."
(matching the CPU manifest), change the non-existent requirements-build.cuda.txt
reference to requirements-build.txt, and add the missing
include_summary_in_sbom: true field (same placement as in the CPU manifest) so
the prefetch can find files and include the SBOM summary.
| "requirements.hashes.source.cuda.txt", | ||
| "requirements.hermetic.txt" | ||
| ], | ||
| "requirements_build_files": ["requirements-build.cuda.txt"], |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/usr/bin/env bash
# Verify whether CUDA build requirements file exists and where it's referenced.
set -euo pipefail
fd -HI '^requirements-build\.cuda\.txt$'
rg -n '"requirements_build_files"' scripts/hermeto/prefetch-pip-cuda.json scripts/hermeto/prefetch-konflux-cuda.jsonRepository: lightspeed-core/rag-content
Length of output: 288
🏁 Script executed:
#!/usr/bin/env bash
# Check for all requirements-related files
set -euo pipefail
echo "=== Searching for requirements-build files ==="
fd -HI 'requirements-build' .
echo -e "\n=== Directory listing of scripts/hermeto ==="
ls -la scripts/hermeto/ | grep -E '(requirements|txt)' || echo "No matches in ls"
echo -e "\n=== Checking for any txt files in scripts/hermeto ==="
fd '\.txt$' scripts/hermeto/Repository: lightspeed-core/rag-content
Length of output: 260
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
echo "=== Content of prefetch-pip-cuda.json ==="
cat -n scripts/hermeto/prefetch-pip-cuda.json
echo -e "\n=== Content of prefetch-konflux-cuda.json ==="
cat -n scripts/hermeto/prefetch-konflux-cuda.jsonRepository: lightspeed-core/rag-content
Length of output: 3054
requirements_build_files references a missing file in both CUDA prefetch configurations.
Both scripts/hermeto/prefetch-pip-cuda.json (line 13) and scripts/hermeto/prefetch-konflux-cuda.json (line 22) reference requirements-build.cuda.txt, which does not exist. This will cause CUDA prefetch to fail before dependency resolution. Change both to requirements-build.txt, which exists at the repository root.
Proposed fixes
In scripts/hermeto/prefetch-pip-cuda.json (line 13):
- "requirements_build_files": ["requirements-build.cuda.txt"],
+ "requirements_build_files": ["requirements-build.txt"],In scripts/hermeto/prefetch-konflux-cuda.json (line 22):
- "requirements_build_files": ["requirements-build.cuda.txt"],
+ "requirements_build_files": ["requirements-build.txt"],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "requirements_build_files": ["requirements-build.cuda.txt"], | |
| "requirements_build_files": ["requirements-build.txt"], |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/hermeto/prefetch-pip-cuda.json` at line 13, Update the JSON prefetch
configs that set the "requirements_build_files" array to reference the missing
file: replace "requirements-build.cuda.txt" with the existing
"requirements-build.txt" in the two prefetch config files that currently list
that value under the "requirements_build_files" key so CUDA prefetch uses the
correct requirements file.
| out="${HERMETO_OUT:-$ROOT/.hermeto-output}" | ||
| if [[ "$kind" == cuda ]]; then | ||
| if ! python3 -c "import json, pathlib; p=pathlib.Path('$out')/'.build-config.json'; d=json.loads(p.read_text()); names=[pathlib.Path(x['abspath']).name for x in d.get('project_files',[])]; raise SystemExit(0 if any('cuda' in n for n in names) else 1)"; then | ||
| echo "error: HERMETO_OUT ($out) looks like CPU prefetch; run pip-cuda into this directory first" >&2 | ||
| exit 1 |
There was a problem hiding this comment.
CUDA preflight error message is misleading for missing/invalid config.
If .build-config.json is absent or malformed, this path still reports “looks like CPU prefetch”, which obscures the real failure.
Suggested fix
out="${HERMETO_OUT:-$ROOT/.hermeto-output}"
if [[ "$kind" == cuda ]]; then
+ [[ -f "$out/.build-config.json" ]] || {
+ echo "error: missing $out/.build-config.json (run pip-cuda prefetch first)" >&2
+ exit 1
+ }
if ! python3 -c "import json, pathlib; p=pathlib.Path('$out')/'.build-config.json'; d=json.loads(p.read_text()); names=[pathlib.Path(x['abspath']).name for x in d.get('project_files',[])]; raise SystemExit(0 if any('cuda' in n for n in names) else 1)"; then
- echo "error: HERMETO_OUT ($out) looks like CPU prefetch; run pip-cuda into this directory first" >&2
+ echo "error: HERMETO_OUT ($out) is not a CUDA prefetch output (expected cuda project_files entries)" >&2
exit 1
fi
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| out="${HERMETO_OUT:-$ROOT/.hermeto-output}" | |
| if [[ "$kind" == cuda ]]; then | |
| if ! python3 -c "import json, pathlib; p=pathlib.Path('$out')/'.build-config.json'; d=json.loads(p.read_text()); names=[pathlib.Path(x['abspath']).name for x in d.get('project_files',[])]; raise SystemExit(0 if any('cuda' in n for n in names) else 1)"; then | |
| echo "error: HERMETO_OUT ($out) looks like CPU prefetch; run pip-cuda into this directory first" >&2 | |
| exit 1 | |
| out="${HERMETO_OUT:-$ROOT/.hermeto-output}" | |
| if [[ "$kind" == cuda ]]; then | |
| [[ -f "$out/.build-config.json" ]] || { | |
| echo "error: missing $out/.build-config.json (run pip-cuda prefetch first)" >&2 | |
| exit 1 | |
| } | |
| if ! python3 -c "import json, pathlib; p=pathlib.Path('$out')/'.build-config.json'; d=json.loads(p.read_text()); names=[pathlib.Path(x['abspath']).name for x in d.get('project_files',[])]; raise SystemExit(0 if any('cuda' in n for n in names) else 1)"; then | |
| echo "error: HERMETO_OUT ($out) is not a CUDA prefetch output (expected cuda project_files entries)" >&2 | |
| exit 1 | |
| fi | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/simulate_hermetic_build.sh` around lines 39 - 43, The current inline
Python preflight (the python3 -c call that reads
p=pathlib.Path('$out')/'.build-config.json' and json.loads(...)) treats missing
or malformed .build-config.json the same as a CPU prefetch and emits the
misleading "looks like CPU prefetch" message; change the check to explicitly
detect file existence and JSON parse errors before testing for 'cuda' in project
file names: first verify ${out}/.build-config.json exists, then try to parse it
and on parse failure emit a clear error mentioning the invalid/malformed config,
and only if parsing succeeds run the current name-check and keep the
CPU-prefetch message for the case where no 'cuda' file names are found. Ensure
you update the echo/exit branches so missing file vs malformed JSON vs
CPU-prefetch are distinct messages and still exit with nonzero on error.
| --hermeto-out) | ||
| HERMETO_OUT="$2" | ||
| shift 2 | ||
| ;; | ||
| --model) | ||
| MODEL_SRC="$2" | ||
| shift 2 | ||
| ;; |
There was a problem hiding this comment.
Guard option values before reading $2.
--hermeto-out and --model assume a following value. With set -u, missing values fail with an unhelpful shell error instead of a clear usage message.
Suggested fix
while [[ "${1:-}" ]]; do
case "$1" in
--hermeto-out)
+ [[ $# -ge 2 ]] || { echo "error: --hermeto-out requires a directory" >&2; exit 1; }
HERMETO_OUT="$2"
shift 2
;;
--model)
+ [[ $# -ge 2 ]] || { echo "error: --model requires a file path" >&2; exit 1; }
MODEL_SRC="$2"
shift 2
;;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| --hermeto-out) | |
| HERMETO_OUT="$2" | |
| shift 2 | |
| ;; | |
| --model) | |
| MODEL_SRC="$2" | |
| shift 2 | |
| ;; | |
| --hermeto-out) | |
| [[ $# -ge 2 ]] || { echo "error: --hermeto-out requires a directory" >&2; exit 1; } | |
| HERMETO_OUT="$2" | |
| shift 2 | |
| ;; | |
| --model) | |
| [[ $# -ge 2 ]] || { echo "error: --model requires a file path" >&2; exit 1; } | |
| MODEL_SRC="$2" | |
| shift 2 | |
| ;; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/stage_hermetic_build_context.sh` around lines 24 - 31, Guard access
to the next positional parameter before assigning HERMETO_OUT and MODEL_SRC to
avoid failing under set -u: in the option parsing block that handles
--hermeto-out and --model, check that "$2" is non-empty (or that there is
another argument) and if missing call the script's usage/error function and
exit, otherwise assign HERMETO_OUT="$2" (for --hermeto-out) or MODEL_SRC="$2"
(for --model") and shift 2; reference the option labels (--hermeto-out, --model)
and the variables HERMETO_OUT and MODEL_SRC when making the change.
729494d to
64e13f7
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
scripts/stage_hermetic_build_context.sh (1)
24-31:⚠️ Potential issue | 🟡 MinorGuard option values before reading
$2.
--hermeto-outand--modelstill assume a following argument; withset -u, missing values fail with an unhelpful shell error.Suggested patch
while [[ "${1:-}" ]]; do case "$1" in --hermeto-out) + [[ $# -ge 2 ]] || { echo "error: --hermeto-out requires a directory" >&2; exit 1; } HERMETO_OUT="$2" shift 2 ;; --model) + [[ $# -ge 2 ]] || { echo "error: --model requires a file path" >&2; exit 1; } MODEL_SRC="$2" shift 2 ;;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/stage_hermetic_build_context.sh` around lines 24 - 31, The case arms handling the --hermeto-out and --model options read "$2" directly which breaks with set -u when the option is the last arg; update the handlers for the --hermeto-out and --model branches to first validate that a following argument exists and is not another option (e.g., check "${2-}" is non-empty and does not start with '-') before assigning HERMETO_OUT or MODEL_SRC and shifting; if the check fails, print a clear error message and exit with non-zero status so missing/invalid option values are handled gracefully.AGENTS.md (1)
108-108:⚠️ Potential issue | 🟡 MinorFix the command typo in the validation step.
The second command is missing
make, so copy/paste execution fails.Suggested patch
-1. Run Hermeto (**`make hermeto-verify-pip-cpu`** / **`hermeto-verify-pip-cuda`**, or set **`HERMETO_OUT`**) so outputs land in **`.hermeto-output-verify-*`** or a directory of your choice. +1. Run Hermeto (**`make hermeto-verify-pip-cpu`** / **`make hermeto-verify-pip-cuda`**, or set **`HERMETO_OUT`**) so outputs land in **`.hermeto-output-verify-*`** or a directory of your choice.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` at line 108, The validation step has a typo: the second command is missing the leading "make" (currently shows "hermeto-verify-pip-cuda"); update the text so both variants use the make invocation (e.g., "make hermeto-verify-pip-cuda") so copy/paste execution works, keeping the rest of the sentence referencing HERMETO_OUT and .hermeto-output-verify-* unchanged.
🧹 Nitpick comments (1)
scripts/stage_hermetic_build_context.sh (1)
85-90: Prevent silent overwrite when twoproject_filesshare the same basename.Writing patched outputs by
Path(pf["abspath"]).namecan overwrite earlier files with the same filename.Suggested patch
data = json.loads(build_config.read_text()) +seen = set() for pf in data.get("project_files", []): path = Path(pf["abspath"]) name = path.name + if name in seen: + raise SystemExit(f"error: duplicate project_files basename: {name}") + seen.add(name) text = pf["template"].replace("${output_dir}", "/cachi2/output") (out_dir / name).write_text(text) print("patched", name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/stage_hermetic_build_context.sh` around lines 85 - 90, The loop that writes patched files uses Path(pf["abspath"]).name and writes to (out_dir / name), which silently overwrites when multiple project_files share the same basename; update the logic in that block to produce unique output paths (e.g., preserve the original relative path, prefix/suffix the basename with a stable disambiguator like an index or hashed abspath, or create subdirectories reflecting the source path), and before writing with (out_dir / name) check for existence and either fail with a clear error or choose the deterministic unique name; modify the variables in this scope (pf, path, name, and the write target) accordingly so no two entries map to the same output filename.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Around line 108-113: The docs currently send Hermeto outputs to
.hermeto-output-verify-* in step 1 but steps 2 and 3 assume .hermeto-output;
update the instructions to consistently use the HERMETO_OUT environment variable
(or explicitly document exporting/passing it) so stage_hermetic_build_context.sh
and simulate_hermetic_build.sh consume the same path as the hermeto-verify-pip-*
runs; specifically, mention exporting HERMETO_OUT before step 1 or show calling
./scripts/stage_hermetic_build_context.sh and
./scripts/simulate_hermetic_build.sh with HERMETO_OUT set, and ensure references
to HERMETO_OUT appear in steps 1–3 and the description of those scripts
(stage_hermetic_build_context.sh, simulate_hermetic_build.sh) so the
default-path mismatch is resolved.
In `@Makefile`:
- Around line 109-110: The Makefile target hermeto-fetch-deps should be declared
phony so make won't treat an identically named file/dir as up-to-date; add a
.PHONY declaration that includes hermeto-fetch-deps (and mirror the same change
for the other similar Hermeto target noted in the review) so these recipe names
are always executed regardless of filesystem entries.
---
Duplicate comments:
In `@AGENTS.md`:
- Line 108: The validation step has a typo: the second command is missing the
leading "make" (currently shows "hermeto-verify-pip-cuda"); update the text so
both variants use the make invocation (e.g., "make hermeto-verify-pip-cuda") so
copy/paste execution works, keeping the rest of the sentence referencing
HERMETO_OUT and .hermeto-output-verify-* unchanged.
In `@scripts/stage_hermetic_build_context.sh`:
- Around line 24-31: The case arms handling the --hermeto-out and --model
options read "$2" directly which breaks with set -u when the option is the last
arg; update the handlers for the --hermeto-out and --model branches to first
validate that a following argument exists and is not another option (e.g., check
"${2-}" is non-empty and does not start with '-') before assigning HERMETO_OUT
or MODEL_SRC and shifting; if the check fails, print a clear error message and
exit with non-zero status so missing/invalid option values are handled
gracefully.
---
Nitpick comments:
In `@scripts/stage_hermetic_build_context.sh`:
- Around line 85-90: The loop that writes patched files uses
Path(pf["abspath"]).name and writes to (out_dir / name), which silently
overwrites when multiple project_files share the same basename; update the logic
in that block to produce unique output paths (e.g., preserve the original
relative path, prefix/suffix the basename with a stable disambiguator like an
index or hashed abspath, or create subdirectories reflecting the source path),
and before writing with (out_dir / name) check for existence and either fail
with a clear error or choose the deterministic unique name; modify the variables
in this scope (pf, path, name, and the write target) accordingly so no two
entries map to the same output filename.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 62542530-4db0-459d-97ae-a4033fbc79b1
📒 Files selected for processing (11)
.gitignoreAGENTS.mdMakefilescripts/gen_containerfile_hermetic_sim.shscripts/hermeto/prefetch-konflux-cpu.jsonscripts/hermeto/prefetch-konflux-cuda.jsonscripts/hermeto/prefetch-pip-cpu.jsonscripts/hermeto/prefetch-pip-cuda.jsonscripts/run_hermeto_fetch_deps.shscripts/simulate_hermetic_build.shscripts/stage_hermetic_build_context.sh
✅ Files skipped from review due to trivial changes (5)
- .gitignore
- scripts/hermeto/prefetch-pip-cuda.json
- scripts/hermeto/prefetch-konflux-cuda.json
- scripts/hermeto/prefetch-konflux-cpu.json
- scripts/hermeto/prefetch-pip-cpu.json
🚧 Files skipped from review as they are similar to previous changes (3)
- scripts/gen_containerfile_hermetic_sim.sh
- scripts/simulate_hermetic_build.sh
- scripts/run_hermeto_fetch_deps.sh
| hermeto-fetch-deps: ## run Hermeto prefetch (HERMETO_MODE=pip-cpu|pip-cuda|full-cpu|full-cuda); podman/docker + network | ||
| @./scripts/run_hermeto_fetch_deps.sh "$(HERMETO_MODE)" |
There was a problem hiding this comment.
Declare hermeto-fetch-deps as phony to avoid accidental no-op.
If a file/directory named hermeto-fetch-deps exists, make can skip this recipe.
Suggested patch
-.PHONY: hermeto-verify-pip-cpu hermeto-verify-pip-cuda hermeto-verify-pip
+.PHONY: hermeto-fetch-deps hermeto-verify-pip-cpu hermeto-verify-pip-cuda hermeto-verify-pipAlso applies to: 115-115
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` around lines 109 - 110, The Makefile target hermeto-fetch-deps
should be declared phony so make won't treat an identically named file/dir as
up-to-date; add a .PHONY declaration that includes hermeto-fetch-deps (and
mirror the same change for the other similar Hermeto target noted in the review)
so these recipe names are always executed regardless of filesystem entries.
64e13f7 to
cfbb1c0
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (4)
scripts/simulate_hermetic_build.sh (2)
33-37:⚠️ Potential issue | 🟠 MajorEntrypoint still doesn’t accept/forward
--model, so documented flow is unusable.Line 33 only accepts
cpu|cuda, and Line 59 never forwards a model path to staging. That breaks the documented model fallback path in this script’s caveats.Proposed fix
-[[ "${1:-}" == cpu || "${1:-}" == cuda ]] || { - echo "usage: $0 cpu|cuda" >&2 +[[ "${1:-}" == cpu || "${1:-}" == cuda ]] || { + echo "usage: $0 cpu|cuda [--model PATH]" >&2 exit 1 } kind="$1" +shift + +MODEL_SRC="" +while [[ "${1:-}" ]]; do + case "$1" in + --model) + [[ $# -ge 2 ]] || { echo "error: --model requires a path" >&2; exit 1; } + MODEL_SRC="$2" + shift 2 + ;; + *) + echo "usage: $0 cpu|cuda [--model PATH]" >&2 + exit 1 + ;; + esac +done @@ -"$ROOT/scripts/stage_hermetic_build_context.sh" --hermeto-out "${HERMETO_OUT:-$ROOT/.hermeto-output}" +stage_args=(--hermeto-out "${HERMETO_OUT:-$ROOT/.hermeto-output}") +if [[ -n "$MODEL_SRC" ]]; then + stage_args+=(--model "$MODEL_SRC") +fi +"$ROOT/scripts/stage_hermetic_build_context.sh" "${stage_args[@]}"Also applies to: 59-59
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/simulate_hermetic_build.sh` around lines 33 - 37, The script currently only accepts "cpu" or "cuda" into the kind variable and never forwards a model path, so update argument parsing to accept an optional --model <path> (or a positional third arg) and populate a model_path variable, validate it if needed, then ensure the staging/launch command that currently references kind also includes and forwards model_path (e.g., append "--model $model_path" or the equivalent variable) where the script prepares/stages the model (the command invoked around the current staging site referenced by the code near line 59); make sure to update any usage text and error handling to reflect the new --model option.
39-43:⚠️ Potential issue | 🟡 MinorCUDA preflight conflates missing/malformed config with CPU-prefetch mismatch.
At Line 41, all Python failures fall through to the Line 42 message (“looks like CPU prefetch”). Missing or invalid
.build-config.jsonshould be surfaced distinctly.Proposed fix
out="${HERMETO_OUT:-$ROOT/.hermeto-output}" if [[ "$kind" == cuda ]]; then - if ! python3 -c "import json, pathlib; p=pathlib.Path('$out')/'.build-config.json'; d=json.loads(p.read_text()); names=[pathlib.Path(x['abspath']).name for x in d.get('project_files',[])]; raise SystemExit(0 if any('cuda' in n for n in names) else 1)"; then - echo "error: HERMETO_OUT ($out) looks like CPU prefetch; run pip-cuda into this directory first" >&2 + cfg="$out/.build-config.json" + [[ -f "$cfg" ]] || { + echo "error: missing $cfg (run pip-cuda prefetch first)" >&2 + exit 1 + } + if ! python3 - <<'PY' "$cfg"; then +import json, pathlib, sys +cfg = pathlib.Path(sys.argv[1]) +try: + d = json.loads(cfg.read_text()) +except Exception as e: + print(f"error: malformed {cfg}: {e}", file=sys.stderr) + raise SystemExit(2) +names = [pathlib.Path(x.get('abspath', '')).name for x in d.get('project_files', [])] +raise SystemExit(0 if any('cuda' in n for n in names) else 1) +PY + if [[ $? -eq 1 ]]; then + echo "error: HERMETO_OUT ($out) is not a CUDA prefetch output (missing cuda project_files entries)" >&2 + fi exit 1 fi fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/simulate_hermetic_build.sh` around lines 39 - 43, The current CUDA preflight conflates any Python failure with a CPU-prefetch mismatch; update the simulate_hermetic_build.sh preflight around variable out and the python3 -c check so missing or unreadable $out/.build-config.json and JSON parse errors are detected and reported separately from the CPU-prefetch mismatch: first check the file exists and is readable and emit a clear "missing or unreadable .build-config.json" error if not, then run a Python check that distinguishes JSONDecodeError (or other parse errors) and prints that specific parse error before exiting nonzero, and finally only print the existing "looks like CPU prefetch" message when the Python check succeeds but finds no 'cuda' in project file names (referencing the python3 -c one-liner and the $out/.build-config.json path).AGENTS.md (2)
108-108:⚠️ Potential issue | 🟡 MinorStep 1 command typo: missing
makebeforehermeto-verify-pip-cuda.Line 108 currently makes the second command non-runnable as written.
Proposed fix
-1. Run Hermeto (**`make hermeto-verify-pip-cpu`** / **`hermeto-verify-pip-cuda`**, or set **`HERMETO_OUT`**) so outputs land in **`.hermeto-output-verify-*`** or a directory of your choice. +1. Run Hermeto (**`make hermeto-verify-pip-cpu`** / **`make hermeto-verify-pip-cuda`**, or set **`HERMETO_OUT`**) so outputs land in **`.hermeto-output-verify-*`** or a directory of your choice.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` at line 108, Fix the typo in the documented Step 1 command: change the second command token from "hermeto-verify-pip-cuda" to "make hermeto-verify-pip-cuda" so both commands read "make hermeto-verify-pip-cpu" and "make hermeto-verify-pip-cuda" (refer to the command strings `make hermeto-verify-pip-cpu` and `hermeto-verify-pip-cuda` in the current text).
108-113:⚠️ Potential issue | 🟠 MajorAlign
HERMETO_OUTacross steps 1–3 to avoid default-path mismatch.Line 108 points users to
.hermeto-output-verify-*, but Lines 109–110 show commands that default to.hermeto-outputunlessHERMETO_OUTis exported/passed explicitly.Proposed fix
-1. Run Hermeto (**`make hermeto-verify-pip-cpu`** / **`make hermeto-verify-pip-cuda`**, or set **`HERMETO_OUT`**) so outputs land in **`.hermeto-output-verify-*`** or a directory of your choice. -2. **`./scripts/stage_hermetic_build_context.sh`** — copies **`deps/`** to **`.hermetic-staging/cachi2/output/`**, writes **`cachi2.env`** (**`PIP_FIND_LINKS=/cachi2/output/deps/pip`**), and writes **`.hermetic-staging/patched-requirements/`** from **`.build-config.json`** **`project_files`** with **`/cachi2/output`**. -3. **`./scripts/simulate_hermetic_build.sh cpu`** (or **`cuda`**) — generates **`.hermetic-staging/Containerfile.sim.*`** (early **`COPY`** of **`cachi2`**, overlay **`COPY`** of patched requirements) and runs **`podman`/`docker` `build`**. +1. Run Hermeto (**`make hermeto-verify-pip-cpu`** / **`make hermeto-verify-pip-cuda`**) and set **`HERMETO_OUT`** to that output dir (for example: **`export HERMETO_OUT=.hermeto-output-verify-cpu`**). +2. **`./scripts/stage_hermetic_build_context.sh --hermeto-out "$HERMETO_OUT"`** — copies **`deps/`** to **`.hermetic-staging/cachi2/output/`**, writes **`cachi2.env`** (**`PIP_FIND_LINKS=/cachi2/output/deps/pip`**), and writes **`.hermetic-staging/patched-requirements/`** from **`.build-config.json`** **`project_files`** with **`/cachi2/output`**. +3. **`HERMETO_OUT="$HERMETO_OUT" ./scripts/simulate_hermetic_build.sh cpu`** (or **`cuda`**) — generates **`.hermetic-staging/Containerfile.sim.*`** (early **`COPY`** of **`cachi2`**, overlay **`COPY`** of patched requirements) and runs **`podman`/`docker` `build`**.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` around lines 108 - 113, The docs currently reference `.hermeto-output-verify-*` but later commands (`./scripts/stage_hermetic_build_context.sh`, `./scripts/simulate_hermetic_build.sh`, `make hermeto-verify-pip-cpu`/`hermeto-verify-pip-cuda`) will use the default `HERMETO_OUT` unless it is exported or passed explicitly; update the steps to consistently show exporting or passing `HERMETO_OUT` (e.g., call out `export HERMETO_OUT=...` or `HERMETO_OUT=... ./scripts/stage_hermetic_build_context.sh`) before step 1 or inline with the script invocations so all steps reference the same output directory and avoid the `.hermeto-output` vs `.hermeto-output-verify-*` mismatch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@AGENTS.md`:
- Line 108: Fix the typo in the documented Step 1 command: change the second
command token from "hermeto-verify-pip-cuda" to "make hermeto-verify-pip-cuda"
so both commands read "make hermeto-verify-pip-cpu" and "make
hermeto-verify-pip-cuda" (refer to the command strings `make
hermeto-verify-pip-cpu` and `hermeto-verify-pip-cuda` in the current text).
- Around line 108-113: The docs currently reference `.hermeto-output-verify-*`
but later commands (`./scripts/stage_hermetic_build_context.sh`,
`./scripts/simulate_hermetic_build.sh`, `make
hermeto-verify-pip-cpu`/`hermeto-verify-pip-cuda`) will use the default
`HERMETO_OUT` unless it is exported or passed explicitly; update the steps to
consistently show exporting or passing `HERMETO_OUT` (e.g., call out `export
HERMETO_OUT=...` or `HERMETO_OUT=... ./scripts/stage_hermetic_build_context.sh`)
before step 1 or inline with the script invocations so all steps reference the
same output directory and avoid the `.hermeto-output` vs
`.hermeto-output-verify-*` mismatch.
In `@scripts/simulate_hermetic_build.sh`:
- Around line 33-37: The script currently only accepts "cpu" or "cuda" into the
kind variable and never forwards a model path, so update argument parsing to
accept an optional --model <path> (or a positional third arg) and populate a
model_path variable, validate it if needed, then ensure the staging/launch
command that currently references kind also includes and forwards model_path
(e.g., append "--model $model_path" or the equivalent variable) where the script
prepares/stages the model (the command invoked around the current staging site
referenced by the code near line 59); make sure to update any usage text and
error handling to reflect the new --model option.
- Around line 39-43: The current CUDA preflight conflates any Python failure
with a CPU-prefetch mismatch; update the simulate_hermetic_build.sh preflight
around variable out and the python3 -c check so missing or unreadable
$out/.build-config.json and JSON parse errors are detected and reported
separately from the CPU-prefetch mismatch: first check the file exists and is
readable and emit a clear "missing or unreadable .build-config.json" error if
not, then run a Python check that distinguishes JSONDecodeError (or other parse
errors) and prints that specific parse error before exiting nonzero, and finally
only print the existing "looks like CPU prefetch" message when the Python check
succeeds but finds no 'cuda' in project file names (referencing the python3 -c
one-liner and the $out/.build-config.json path).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cfcc3385-7785-452e-a600-6b8d8c566ccd
📒 Files selected for processing (11)
.gitignoreAGENTS.mdMakefilescripts/gen_containerfile_hermetic_sim.shscripts/hermeto/prefetch-konflux-cpu.jsonscripts/hermeto/prefetch-konflux-cuda.jsonscripts/hermeto/prefetch-pip-cpu.jsonscripts/hermeto/prefetch-pip-cuda.jsonscripts/run_hermeto_fetch_deps.shscripts/simulate_hermetic_build.shscripts/stage_hermetic_build_context.sh
✅ Files skipped from review due to trivial changes (5)
- .gitignore
- scripts/hermeto/prefetch-pip-cuda.json
- scripts/hermeto/prefetch-pip-cpu.json
- scripts/hermeto/prefetch-konflux-cuda.json
- scripts/gen_containerfile_hermetic_sim.sh
🚧 Files skipped from review as they are similar to previous changes (4)
- scripts/hermeto/prefetch-konflux-cpu.json
- Makefile
- scripts/stage_hermetic_build_context.sh
- scripts/run_hermeto_fetch_deps.sh
Description
Added shell scripts for running hermeto (the prefetch step of hermetic Konflux builds) and instructions for AI agents on how to use it. This is needed for the agent to run a modify-test-debug loop for Python dependency updates locally, as opposed to a human engineer needing to commit/push/wait on the Konflux pipeline.
Strictly speaking this PR depends on LCORE-1440, but broken into a separate PR to simplify review.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Documentation
Chores