LCORE-1440: Build CUDA GPU and arm64 versions of rag-content in Konflux by syedriko · Pull Request #107 · lightspeed-core/rag-content

syedriko · 2026-03-22T21:30:14Z

Description

This PR enables hermetic Konflux builds of the CUDA rag-content build images, for both x86_64 and aarch64 arches.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor
Generated by:

Related Tickets & Documents

Related Issue #
Closes # https://redhat.atlassian.net/browse/LCORE-1440

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Added CUDA-enabled container image variant supporting multi-architecture builds (x86_64, aarch64).
- Integrated sentence embeddings model for improved semantic search capabilities.
- Introduced automated multi-architecture CI/CD pipelines for containerized builds and deployments.
Documentation
- Enhanced README with CUDA variant guidance and dependency management instructions.
Chores
- Updated base image versions and dependency configurations.
- Added pre-commit and Renovate automation for code quality and dependency updates.

coderabbitai · 2026-03-22T21:30:21Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This PR introduces comprehensive CUDA support alongside existing CPU variants for the RAG content system, including new Containerfile definitions, multi-architecture Tekton pipeline specifications, CUDA-specific dependency management scripts, and embedding model training infrastructure. GitHub workflows are redirected to a feature branch, and build/dependency configuration is restructured to separately manage CPU and CUDA package resolution paths.

Changes

Cohort / File(s)	Summary
CI/CD & Workflow Configuration `.github/workflows/build_and_push_dev.yaml`, `.github/workflows/build_and_push_dev_cuda.yaml`, `.github/workflows/pydocstyle.yaml`	Updated workflow triggers from `main` to `syedriko-lcore-1440-cuda` branch; renamed GPU-focused workflow to CUDA terminology with corresponding image name/Containerfile updates; narrowed pydocstyle linting scope to `src`, `scripts`, `tests` directories.
Tekton Pipeline Manifests `.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml`, `.tekton/lightspeed-core-rag-content-c0ec3-push.yaml`, `.tekton/lightspeed-core-rag-content-cpu-f176b-pull-request.yaml`, `.tekton/lightspeed-core-rag-content-cpu-f176b-push.yaml`, `.tekton/rag-tool-pull-request-cuda.yaml`, `.tekton/rag-tool-push-cuda.yaml`	Added six new multi-architecture pipeline definitions for pull-request and push events with hermetic dependency prefetching, matrix-based per-platform image builds, conditional security scans, and artifact publishing; each ~650 lines configuring OCI-based cloning, prefetch steps, buildah tasks, and compliance checks.
Tekton Prefetch Configuration `.tekton/rag-tool-pull-request.yaml`, `.tekton/rag-tool-push.yaml`	Updated pip wheel allowlist and binary package selection; added architecture-specific wheel requirements files (`cpu.x86_64.txt`, `cpu.aarch64.txt`) and modified bundler prefetch configuration.
Container Build Definitions `Containerfile`, `Containerfile-cuda`, `Containerfile-gpu`	Added new `Containerfile-cuda` with CUDA 12.9.1 base and hermetic/non-hermetic build paths for CUDA-specific dependencies; modified base `Containerfile` to support architecture-specific wheel requirements and conditional toolchain installation; removed deprecated `Containerfile-gpu`.
Build Arguments & Configuration `build-args-konflux.conf`, `build-args-konflux-cuda.conf`	Created CUDA-specific build args file targeting CUDA 12.9 RHEL9 base image; updated CPU base image version from 3.2 → 3.3.
Repository Configuration Files `cuda/cuda.repo`, `cuda/redhat.repo`, `cuda/rpms.in.yaml`, `cuda/rpms.lock.yaml`	Added new CUDA repository definitions for NVIDIA/RHEL EUS packages and corresponding RPM lock files for x86_64/aarch64 architectures; defined CUDA-specific build dependencies.
Dependency Management Scripts `scripts/konflux_requirements.sh`, `scripts/konflux_requirements_cuda.sh`	Enhanced CPU script with architecture-specific wheel handling and PyPI/RHOAI index deduplication; added new comprehensive CUDA variant for hermetic CUDA-focused dependency resolution with split source/wheel/PyPI-wheel outputs and per-architecture wheel pin generation.
Verification & Utility Scripts `scripts/list_wheel_requires_dist.py`, `scripts/verify_aarch64_hashes.py`, `scripts/remove_pytorch_cpu_pyproject.py`, `scripts/generate-rpm-lock.sh`, `scripts/generate_packages_to_prefetch.py`	Added wheel metadata extraction and aarch64 hash verification tools; updated existing scripts to accept CLI arguments and add validation checks; minor formatting adjustments.
Python Requirements Files (Source/Build) `requirements-build.cuda.txt`, `requirements-build.txt`, `requirements.hashes.source.cuda.txt`, `requirements.hashes.source.txt`, `requirements.overrides.cuda.txt`, `requirements.overrides.txt`, `rpms.in.yaml`, `rpms.lock.yaml`	Added CUDA-specific source/build dependency lockfiles with hashes; updated CPU variants with new/upgraded package versions (llama-stack 0.5.0, faiss-cpu, mcp, oci, oracledb); extended RPM build dependencies (libpq-devel, swig, autoconf, automake, libtool).
Python Requirements Files (Wheels) `requirements.hashes.wheel.cpu.aarch64.txt`, `requirements.hashes.wheel.cpu.x86_64.txt`, `requirements.hashes.wheel.cuda.txt`, `requirements.hashes.wheel.pypi.cuda.aarch64.txt`, `requirements.hashes.wheel.pypi.cuda.x86_64.txt`, `requirements.hashes.wheel.txt`	Added architecture-specific CPU wheel pins for torch/torchvision/triton; added CUDA-specific wheel configurations with RHOAI/PyPI source selection; updated primary wheel hash file with multi-index strategy.
Embeddings Model Directory `embeddings_model/.gitattributes`, `embeddings_model/config*.json`, `embeddings_model/modules.json`, `embeddings_model/special_tokens_map.json`, `embeddings_model/tokenizer_config.json`, `embeddings_model/sentence_bert_config.json`, `embeddings_model/data_config.json`, `embeddings_model/README.md`, `embeddings_model/train_script.py`, `embeddings_model/version.txt`	Added complete embedding model (sentence-transformers/all-mpnet-base-v2) with configuration files, metadata, and a comprehensive TPU/XLA-based distributed training script supporting multi-dataset streaming, cross-entropy loss variants, and checkpoint management.
Build System & Project Configuration `Makefile`, `pyproject.toml`, `.pre-commit-config.yaml`, `renovate.json`	Added CUDA-specific Makefile targets (`konflux-requirements-cuda`, `list-wheel-requires-dist`); updated `format` target with `--all-files` flag; changed test runner from unittest to pytest; updated base image computation; added pre-commit configuration for black/ruff on scoped directories; added Renovate config with python package exclusions.
Documentation `README.md`	Expanded CUDA variant documentation with hermetic/non-hermetic dependency handling, aarch64-specific PyTorch CUDA wheel sourcing guidance, and updated RPM prerequisites/base-image change instructions.
Repository Metadata `ubi.repo`	Removed entire UBI 9 repository configuration file; reposit definitions now sourced from `cuda/redhat.repo`.
Source Code Formatting & Minor Logic `src/lightspeed_rag_content/asciidoc/__main__.py`, `src/lightspeed_rag_content/document_processor.py`, `src/lightspeed_rag_content/html/__main__.py`, `src/lightspeed_rag_content/html/html_reader.py`, `src/lightspeed_rag_content/metadata_processor.py`, `src/lightspeed_rag_content/okp.py`, `src/lightspeed_rag_content/utils.py`, `scripts/distance.py`, `scripts/query_rag.py`	Refactored multi-line method/function calls into single-line forms for readability; added type casting in okp.py metadata handling; added runtime type validation in document_processor for RAGDocument instances; improved content normalization tolerance.
Test Code Formatting `tests/asciidoc/test__main__.py`, `tests/asciidoc/test_asciidoc_converter.py`, `tests/html_tests/test__main__.py`, `tests/test_document_processor.py`, `tests/test_document_processor_llama_index.py`, `tests/test_document_processor_llama_stack.py`, `tests/test_metadata_processor.py`, `tests/test_utils.py`	Simplified multi-line pytest fixture/mock/assertion formatting without changing test logic or behavioral expectations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.89% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title is a concise, specific description of the main change: enabling CUDA GPU and arm64 (aarch64) builds in Konflux CI/CD, which is the core objective of the changeset.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

syedriko · 2026-03-23T14:57:06Z

/ok-to-test

TamiTakamiya · 2026-03-23T19:51:53Z

.tekton/rag-tool-pull-request.yaml

           "requirements_build_files": ["requirements-build.txt"],
           "binary": {
-             "packages": "aiohappyeyeballs,aiohttp,aiosignal,aiosqlite,annotated-doc,annotated-types,anyio,asyncpg,attrs,beautifulsoup4,cffi,click,colorama,cryptography,dataclasses-json,defusedxml,distro,docling-parse,et-xmlfile,faiss-cpu,filetype,fire,frozenlist,greenlet,h11,hf-xet,httpcore,httpx,idna,jinja2,jiter,joblib,jsonlines,jsonref,jsonschema-specifications,latex2mathml,lxml,markdown-it-py,markupsafe,mdurl,mpire,mpmath,mypy-extensions,nest-asyncio,networkx,numpy,openpyxl,packaging,pandas,pillow,pluggy,prompt-toolkit,propcache,psycopg2-binary,pydantic,pydantic-core,pygments,pylatexenc,pypdfium2,python-dateutil,python-docx,python-pptx,pyyaml,referencing,requests,rpds-py,rtree,safetensors,scikit-learn,scipy,setuptools,shapely,shellingham,six,sniffio,sympy,threadpoolctl,tiktoken,tokenizers,torch,torchvision,transformers,triton,typing-extensions,typing-inspect,typing-inspection,tzdata,wrapt,xlsxwriter,zipp,uv-build,uv,pip,maturin,opencv-python,omegaconf,rapidocr,sqlite-vec,griffe,griffecli,griffelib,pyclipper,tree-sitter-typescript",
+             "packages": "torch,torchvision,triton,uv-build,uv,pip,maturin,opencv-python,omegaconf,rapidocr,sqlite-vec,griffe,griffecli,griffelib,pyclipper,tree-sitter-typescript,hf-xet,psycopg2-binary,docling-parse,pypdf,pypdfium2,jiter,pylatexenc,aiohappyeyeballs,aiohttp,aiosignal,aiosqlite,annotated-doc,annotated-types,anyio,asyncpg,beautifulsoup4,cffi,click,colorama,cryptography,dataclasses-json,defusedxml,distro,et-xmlfile,faiss-cpu,filetype,fire,frozenlist,greenlet,h11,httpcore,httpx,httpx-sse,idna,jinja2,joblib,jsonlines,jsonref,jsonschema-specifications,lxml,markdown-it-py,markupsafe,mdurl,mpire,mpmath,mypy-extensions,nest-asyncio,networkx,numpy,openpyxl,packaging,pandas,pillow,pluggy,prompt-toolkit,propcache,pydantic,pydantic-core,pygments,python-dateutil,python-docx,python-pptx,pyyaml,referencing,requests,rpds-py,rtree,safetensors,scikit-learn,scipy,setuptools,shapely,shellingham,six,sniffio,sympy,threadpoolctl,tiktoken,tokenizers,transformers,typing-extensions,typing-inspect,typing-inspection,tzdata,xlsxwriter,zipp",


@syedriko I have a question: I do not see "allow_binary": "true" is not specified for "type": "pip" . In that case, are those packages defined in the "binary" section installed from binaries? I think the default value of "allow_binary" is "false".

hermeto was printing out

2026-03-20 20:13:08,875 WARNING The 'allow_binary' field is deprecated and will be removed in the next major version. Please use 'binary': {} instead of 'allow_binary': true.

so I removed allow_binary.

coderabbitai

Actionable comments posted: 10

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

scripts/remove_pytorch_cpu_pyproject.py (1)
4-10: ⚠️ Potential issue | 🟡 Minor

Update usage docs to match new CLI behavior.

The docstring still implies the script only operates on pyproject.toml in the current directory, but Line 49 now supports passing a custom path. This can mislead users in build scripts.
📝 Proposed docstring fix
 Usage:
-    python remove_pytorch_cpu_pyproject.py
+    python remove_pytorch_cpu_pyproject.py [path/to/pyproject.toml]
 
-The script will remove the 'tool.uv.index' and 'tool.uv.sources' sections
-from the pyproject.toml file in the current directory.
+The script removes the 'tool.uv.index' and 'tool.uv.sources' sections
+from the provided pyproject.toml file, or from ./pyproject.toml by default.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/remove_pytorch_cpu_pyproject.py` around lines 4 - 10, Update the
module docstring and top-level usage/help text in
remove_pytorch_cpu_pyproject.py to reflect the new CLI behavior: state that the
script accepts an optional pyproject file path (defaulting to ./pyproject.toml)
and demonstrate both usages (no-arg uses current directory and passing a path
uses that file), and ensure any mention of line 49’s argparse/parse_args or
main() parameter names is consistent with the updated description so users and
build scripts are not misled.
Containerfile (1)
46-56: ⚠️ Potential issue | 🟠 Major

Quote the uv version specifier in the pip install command.

Line 23 contains RUN pip3.12 install uv>=0.7.20 which the shell parses as pip3.12 install uv > =0.7.20, treating >= as output redirection. This drops the version constraint, leaving the hermetic build to run against an unpinned uv version. The fix is to quote the version specifier:
Minimal fix
-RUN pip3.12 install uv>=0.7.20
+RUN pip3.12 install "uv>=0.7.20"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Containerfile` around lines 46 - 56, The pip install invocation is parsing
the unquoted uv version specifier as shell redirection; update the pip3.12
install command that contains "pip3.12 install uv>=0.7.20" so the version
constraint is quoted (e.g., "uv>=0.7.20") to ensure the >= comparator is passed
to pip rather than interpreted by the shell; locate the pip3.12 install line in
the Containerfile and wrap the uv version specifier in quotes.
requirements-build.txt (1)
45-92: ⚠️ Potential issue | 🔴 Critical

Resolve duplicate package entries in lockfile—28 duplicates found across build dependencies.

This file contains 28 duplicate package entries with conflicting versions: hatchling (1.26.3 vs 1.29.0), setuptools (82.0.0 vs 82.0.1), and 26 others. While autogenerated, having multiple versions of the same package creates ambiguity about which version is installed, breaking reproducibility. This suggests a misconfiguration in the pybuild-deps dependency resolution. Investigate and clean the requirements.source.txt.build input file or adjust the pip-compile invocation to produce a valid lockfile with unique package-version pairs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements-build.txt` around lines 45 - 92, The lockfile contains duplicate
package entries (e.g., hatchling 1.26.3 vs 1.29.0, setuptools 82.0.0 vs 82.0.1
and ~26 others) causing ambiguous installs; to fix, inspect and dedupe the
source dependency input (requirements.source.txt.build or the pybuild-deps
input) and re-run pip-compile (or your pip-tools invocation) so it outputs
unique package==version lines, ensuring only one resolved version per package;
specifically remove or reconcile conflicting pins in the source file(s) and
re-generate requirements-build.txt so entries like hatchling and setuptools
appear once with the chosen resolved version.

🟠 Major comments (13)

requirements.hashes.wheel.cpu.aarch64.txt-2-7 (1)

2-7: ⚠️ Potential issue | 🟠 Major

Address mixed wheel source domains across hermetic build requirements.

The requirements.hashes.wheel.* files systematically mix packages.redhat.com and files.pythonhosted.org. If Konflux/Cachi2 policy enforces internal-only or allowlist-restricted artifact sources, this pattern will break hermetic builds or violate supply-chain security controls.

Affected files: requirements.hashes.wheel.cpu.aarch64.txt, requirements.hashes.wheel.cpu.x86_64.txt, requirements.hashes.wheel.pypi.cuda.*.txt, and requirements.hashes.wheel.pypi.cuda.base.txt all reference files.pythonhosted.org.

Ensure all wheel dependencies are sourced from approved registries or mirror all external wheels to an internal repository before merging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements.hashes.wheel.cpu.aarch64.txt` around lines 2 - 7, The
requirements file mixes external domains (packages.redhat.com and
files.pythonhosted.org) which violates the internal-only artifact policy; update
the wheel URLs in requirements.hashes.wheel.cpu.aarch64.txt (and the other
listed files) so that all wheels (e.g., torch, torchvision, triton) point to an
approved internal registry or mirror, or mirror the torchvision wheel from
files.pythonhosted.org into the internal repository and replace its URL and hash
accordingly; ensure the SHA256 hashes are updated to match the mirrored
artifacts and keep the package specifiers (torch, torchvision, triton) intact so
pip installs the internal-hosted wheels only.

renovate.json-7-13 (1)

7-13: ⚠️ Potential issue | 🟠 Major

Disabling all Python updates is a security risk.

This rule disables Renovate for the entire Python category (matchCategories: ["python"] with "enabled": false), which blocks all Python dependency updates—including vulnerability and bugfix updates—per Renovate's documented behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@renovate.json` around lines 7 - 13, The current packageRules entry disables
all Python updates by setting "matchCategories": ["python"] with "enabled":
false; remove or change that rule so Renovate continues receiving Python
security and bugfix updates—either delete the rule or set "enabled": true for
the python category, and if you intended to only suppress specific update types
create a narrower rule using "matchPackageNames"/"matchPackagePatterns" or
"allowedUpdates" to target those packages/versions instead of turning off the
entire python category; refer to the existing "packageRules", "matchCategories",
and "enabled" keys when making this change.

.github/workflows/build_and_push_dev.yaml-4-5 (1)

4-5: ⚠️ Potential issue | 🟠 Major

Revert branch filter before merging.

The push trigger is changed from main to syedriko-lcore-1440-cuda (the feature branch). This is likely for testing purposes during development, but it must be reverted to main before merging this PR, otherwise the workflow won't trigger on pushes to the main branch.
Proposed fix to restore main branch trigger
 on:
   push:
-    branches: [ syedriko-lcore-1440-cuda ]
+    branches: [ main ]
   pull_request:
     branches: ["main"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build_and_push_dev.yaml around lines 4 - 5, The workflow's
push trigger currently lists the feature branch name under the push.branches
filter; change the branches array back to the main branch so the workflow runs
on pushes to main. Locate the push: branches: entry in the YAML (the current
value "syedriko-lcore-1440-cuda") and replace it with "main" (i.e., set
push.branches to ["main"]) before merging.

cuda/cuda.repo-8-8 (1)

8-8: ⚠️ Potential issue | 🟠 Major

Enable repository metadata verification.

gpgcheck=1 verifies RPM payloads, but repo_gpgcheck=0 skips signature verification for repository metadata. NVIDIA publishes signed repodata for these repositories (verified for both x86_64 and sbsa), so change repo_gpgcheck=0 to repo_gpgcheck=1 to enforce consistent cryptographic verification across the supply chain.

Also applies to: 17-17
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cuda/cuda.repo` at line 8, Change the repository metadata verification
setting from repo_gpgcheck=0 to repo_gpgcheck=1 to enforce signed repodata
verification; update every occurrence of the repo_gpgcheck key (e.g., the
existing repo_gpgcheck line and the other occurrence noted) so repository
metadata is GPG-checked consistently while leaving gpgcheck as-is for RPM
payload verification.

scripts/verify_aarch64_hashes.py-15-18 (1)

15-18: ⚠️ Potential issue | 🟠 Major

Keep the x86_64 file out of the aarch64 verification input.

req_paths unions the base, x86_64, and aarch64 files, so this script still passes when the aarch64 digest was accidentally committed only to the x86_64 file. That defeats the point of validating the aarch64 lockfile.

🔧 Narrow the check to the files that actually feed the aarch64 build

-REQUIREMENTS_ARCH_FILES = [
-    "requirements.hashes.wheel.pypi.cuda.x86_64.txt",
-    "requirements.hashes.wheel.pypi.cuda.aarch64.txt",
-]
+REQUIREMENTS_AARCH64_FILES = [
+    REQUIREMENTS_BASE,
+    "requirements.hashes.wheel.pypi.cuda.aarch64.txt",
+]

-    req_paths = [REQUIREMENTS_BASE, *REQUIREMENTS_ARCH_FILES]
+    req_paths = REQUIREMENTS_AARCH64_FILES

Also applies to: 157-160

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/verify_aarch64_hashes.py` around lines 15 - 18, The list
REQUIREMENTS_ARCH_FILES currently includes the x86_64 file which lets req_paths
union base + x86_64 + aarch64 and masks misplaced aarch64 digests; remove the
x86_64 entry so only the aarch64-specific lockfile is used for aarch64
verification (and update any other uses/duplicates of REQUIREMENTS_ARCH_FILES at
the other occurrence mentioned to match), ensuring that req_paths is built
without unioning the x86_64 file when running the aarch64 check.

scripts/verify_aarch64_hashes.py-82-86 (1)

82-86: ⚠️ Potential issue | 🟠 Major

Request failures currently get reported as a passing check.

get_aarch64_hashes_from_pypi() returns [] for both “no wheel found” and request errors, and _aarch64_verify_results() records that path as present(..., 0). A bad version pin or transient PyPI failure therefore keeps this verifier green.

🔧 Distinguish fetch failures from successful lookups

-def get_aarch64_hashes_from_pypi(package: str, version: str) -> list[str]:
+def get_aarch64_hashes_from_pypi(package: str, version: str) -> list[str] | None:
 ...
-    except Exception:
-        return []  # package not on PyPI or network error
+    except Exception:
+        return None

-        if not aarch64_hashes:
-            present.append((package, version, 0))
+        if aarch64_hashes is None or not aarch64_hashes:
+            missing.append((package, version, []))
             continue

Also applies to: 118-121

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/verify_aarch64_hashes.py` around lines 82 - 86, The function
get_aarch64_hashes_from_pypi currently masks network/request errors by returning
[] for both “no wheel” and fetch failures; change it so that only a legitimate
“no wheel found” returns [] and any network/HTTP/URL errors propagate (or raise
a specific exception) so callers like _aarch64_verify_results can treat fetch
failures as errors. Concretely, replace the broad except Exception around
urllib.request.urlopen and json.load with targeted exception handling (e.g.,
urllib.error.URLError, urllib.error.HTTPError, json.JSONDecodeError) and
re-raise or return a sentinel (e.g., None) for fetch failures, and update the
calling logic in _aarch64_verify_results to treat that sentinel/exception as a
verifier failure rather than as present(..., 0); make the same change for the
other identical block at lines 118-121.

.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml-5-12 (1)

5-12: ⚠️ Potential issue | 🟠 Major

Fix the CEL expression and fork URL before merging.

The trigger condition target_branch == "syedriko-lcore-1440-cuda" will block this PR because it targets main, where target_branch will be "main". The hard-coded fork URL also needs to point to the correct repository.

Fix

-    build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}
+    build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}
-    pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
-      == "syedriko-lcore-1440-cuda"
+    pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
+      == "main"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml around lines 5 -
12, The pipelinesascode trigger CEL under the key
pipelinesascode.tekton.dev/on-cel-expression is incorrect for this PR — update
the CEL expression to match the actual target branch (e.g., replace
target_branch == "syedriko-lcore-1440-cuda" with target_branch == "main" or make
it parameterized to accept the PR's target) so pull_request events against main
will fire; also fix the hard-coded fork URL under
build.appstudio.openshift.io/repo to point to the correct repository (replace
the current
https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}} value
with the upstream/fork URL intended for this pipeline). Ensure you update only
the pipelinesascode.tekton.dev/on-cel-expression and
build.appstudio.openshift.io/repo entries so the trigger and repo reference are
accurate.

.tekton/lightspeed-core-rag-content-c0ec3-push.yaml-5-11 (1)

5-11: ⚠️ Potential issue | 🟠 Major

Update this push PipelineRun from feature-branch values before merge.

Once merged to main, pushes to main will have target_branch == "main", so the CEL expression target_branch == "syedriko-lcore-1440-cuda" will never match. The repository annotation must also be updated from the fork to the upstream repository.

Fix

-    build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}
+    build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}
-    pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch
-      == "syedriko-lcore-1440-cuda"
+    pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch
+      == "main"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml around lines 5 - 11,
Update the PipelineRun annotations so they match the upstream repo and branch:
replace the fork repository value in the build.appstudio.openshift.io/repo
annotation with the upstream repository URL and change the
pipelinesascode.tekton.dev/on-cel-expression condition (the value for
pipelinesascode.tekton.dev/on-cel-expression) to reference target_branch ==
"main" instead of "syedriko-lcore-1440-cuda" so pushes to main will trigger the
run.

.tekton/lightspeed-core-rag-content-cpu-f176b-push.yaml-5-11 (1)

5-11: ⚠️ Potential issue | 🟠 Major

Update this push PipelineRun from feature-branch values before merge.

Once this file lands on main, the CEL expression target_branch == "syedriko-lcore-1440-cuda" will never match pushes to main (where target_branch will be "main"), so the CPU image will not build on the shipping branch. The repo annotation is also still pinned to the fork.

🔧 Minimal fix

-    build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}
+    build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}
-    pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch
-      == "syedriko-lcore-1440-cuda"
+    pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch
+      == "main"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-cpu-f176b-push.yaml around lines 5 - 11,
The push PipelineRun uses a forked repo and a branch-specific CEL filter so it
won't run on main; update build.appstudio.openshift.io/repo from the fork URL to
the canonical upstream repo, and change or broaden the
pipelinesascode.tekton.dev/on-cel-expression so it matches pushes to main (e.g.,
remove the hard-coded target_branch comparison or include "main") instead of
only "syedriko-lcore-1440-cuda"; also ensure
build.appstudio.redhat.com/target_branch uses the templated value
'{{target_branch}}' as-is so runtime branch matching works (look for the keys
pipelinesascode.tekton.dev/on-cel-expression, build.appstudio.openshift.io/repo,
and build.appstudio.redhat.com/target_branch to apply the changes).

Makefile-67-71 (1)

67-71: ⚠️ Potential issue | 🟠 Major

Fix the radon cc CLI syntax for the grade threshold.

The -a flag does not accept a grade argument—A is being parsed as a path. Use -n A to specify the minimum grade threshold: radon cc -a -n A src/.
Diff
-	`@RADON_OUT`=$$(uv run radon cc -a A src/) && \
+	`@RADON_OUT`=$$(uv run radon cc -a -n A src/) && \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 67 - 71, The Makefile rule capturing RADON_OUT calls
"uv run radon cc -a A src/" which incorrectly passes "A" as a path; update the
command invoked where RADON_OUT is set (the `uv run radon cc` invocation) to use
the correct flag for minimum grade threshold by adding `-n A` (i.e., `radon cc
-a -n A src/`) so that the grade filter is applied correctly when computing
GRADE from OUTPUT.

scripts/konflux_requirements.sh-150-190 (1)

150-190: 🛠️ Refactor suggestion | 🟠 Major

Guard these manual direct-URL rewrites against resolver drift.

This block emits fixed wheel URLs/hashes for the torch stack and pylatexenc, but nothing asserts that the pins resolved above still match them. A future resolver/source change will silently generate hermetic lockfiles that disagree with requirements.wheel*.

💡 Fail fast when the hard-coded artifacts drift

+for expected in "torch==2.9.0" "torchvision==0.24.0" "triton==3.5.0"; do
+    grep -qxF "$expected" "$WHEEL_FILE" "$WHEEL_FILE_PYPI" || {
+        echo "Resolved torch stack no longer matches $expected; update the direct-URL fragments." >&2
+        exit 1
+    }
+done

 uv pip compile "$WHEEL_FILE_PYPI" --refresh --generate-hashes --python-version 3.12 --emit-index-url --no-deps --no-annotate > "$WHEEL_HASH_FILE_PYPI"
+grep -qE '^pylatexenc==2\.10([[:space:]]|$)' "$WHEEL_HASH_FILE_PYPI" || {
+    echo "Resolved pylatexenc pin changed; update PYLATEXENC_CPU_PULP_*." >&2
+    exit 1
+}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/konflux_requirements.sh` around lines 150 - 190, The script writes
hard-coded wheel entries (WHEEL_HASH_CPU_X86, WHEEL_HASH_CPU_AARCH) and then
regenerates a PyPI lock (WHEEL_HASH_FILE_PYPI) but never verifies that the
resolved pins still match the baked-in URLs/hashes (e.g., TV_X86_URL/TV_X86_SHA,
TV_AARCH_URL/TV_AARCH_SHA, PYLATEXENC_CPU_PULP_URL/PYLATEXENC_CPU_PULP_SHA256);
add a verification step immediately after the uv pip compile and after the
pylatexenc rewrite that parses the generated "$WHEEL_HASH_FILE_PYPI" (and the
two generated CPU files) and compares the resolved package entries to the
expected constants, and if any URL or hash differs, print a clear error
describing the mismatched symbol (reference TV_X86_SHA, TV_AARCH_SHA,
PYLATEXENC_CPU_PULP_SHA256, WHEEL_HASH_CPU_X86, WHEEL_HASH_CPU_AARCH,
WHEEL_HASH_FILE_PYPI) and exit non‑zero so the pipeline fails fast.

.tekton/lightspeed-core-rag-content-cpu-f176b-pull-request.yaml-5-12 (1)

5-12: ⚠️ Potential issue | 🟠 Major

Branch target and repo URL appear to be development artifacts.

Line 5: The repo annotation points to syedriko/lightspeed-core-rag-content (personal fork)
Lines 11-12: The CEL expression targets branch syedriko-lcore-1440-cuda instead of main

These appear to be leftover from development. For production, this should target the main repository and the appropriate target branch.

🔧 Suggested fix

   annotations:
-    build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}
+    build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}
     build.appstudio.redhat.com/commit_sha: '{{revision}}'
     build.appstudio.redhat.com/pull_request_number: '{{pull_request_number}}'
     build.appstudio.redhat.com/target_branch: '{{target_branch}}'
     pipelinesascode.tekton.dev/cancel-in-progress: "true"
     pipelinesascode.tekton.dev/max-keep-runs: "3"
     pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
-      == "syedriko-lcore-1440-cuda"
+      == "main"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-cpu-f176b-pull-request.yaml around lines
5 - 12, The annotations reference a personal fork and a hardcoded feature
branch—replace the repo annotation value (build.appstudio.openshift.io/repo)
pointing to "syedriko/lightspeed-core-rag-content" with the official production
repo URL and change build.appstudio.redhat.com/target_branch and the CEL
expression (pipelinesascode.tekton.dev/on-cel-expression) to target the
production branch (e.g., "main" or the pipeline's target variable) instead of
"syedriko-lcore-1440-cuda" so the pipeline runs against the correct repository
and branch; update the pull_request_number/commit_sha templating if needed to
keep them generic.

embeddings_model/train_script.py-182-185 (1)

182-185: ⚠️ Potential issue | 🟠 Major

Data loss: first sample from each dataset is discarded.

Line 185 calls next(dataset) to determine the column count, but this sample is never used for training. The consumed item is lost.

🐛 Suggested fix

     datasets = []
+    first_samples = []
     for filepath in filepaths:
         if "reddit_" in filepath:       `#Special` dataset class for Reddit files
             data_obj = RedditDataset(filepath)
         else:
             data_obj = Dataset(filepath)
-        datasets.append(iter(data_obj)) 
-    
-    # Store if dataset is in a 2 col or 3 col format
-    num_cols = {idx: len(next(dataset)) for idx, dataset in enumerate(datasets)}
+        it = iter(data_obj)
+        first = next(it)
+        first_samples.append(first)
+        datasets.append(it)
+
+    # Store if dataset is in a 2 col or 3 col format (using the first samples we already read)
+    num_cols = {idx: len(sample) for idx, sample in enumerate(first_samples)}
+    # Re-inject first samples by chaining them back
+    from itertools import chain
+    datasets = [chain([first_samples[idx]], ds) for idx, ds in enumerate(datasets)]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/train_script.py` around lines 182 - 185, The code consumes
the first item of each iterator when computing num_cols (using next(dataset)),
causing loss of that sample; preserve the first item by capturing it (e.g.,
first = next(dataset)) and then reinsert it into the iterator (for example by
replacing dataset with itertools.chain([first], dataset) or using itertools.tee
to duplicate the iterator) before storing in datasets and computing num_cols so
no training sample is dropped; ensure references to the variables datasets and
num_cols and the place where iter(data_obj) is created are updated accordingly.

🟡 Minor comments (5)

src/lightspeed_rag_content/okp.py-64-64 (1)

64-64: ⚠️ Potential issue | 🟡 Minor

metadata_has_url_and_title() can return non-bool and accepts empty URLs.

At Line 64, the expression may return a string (not bool) and only checks URL key presence, not URL content. That weakens the function contract and can pass invalid metadata.

🔧 Proposed fix

 def metadata_has_url_and_title(metadata: dict[str, Any]) -> bool:
@@
-    return "reference_url" in metadata.get("extra", {}) and metadata.get("title", "").strip()
+    extra = metadata.get("extra", {})
+    reference_url = extra.get("reference_url") if isinstance(extra, dict) else None
+    title = metadata.get("title")
+    return (
+        isinstance(reference_url, str)
+        and bool(reference_url.strip())
+        and isinstance(title, str)
+        and bool(title.strip())
+    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lightspeed_rag_content/okp.py` at line 64, The function
metadata_has_url_and_title currently can return a non-bool because the
right-hand operand returns a stripped title string and it only checks for
presence of the "reference_url" key instead of a non-empty URL; change
metadata_has_url_and_title to explicitly check that metadata.get("extra",
{}).get("reference_url", "") is a non-empty string (after strip()) and that
metadata.get("title", "") is a non-empty string (after strip()), and return a
boolean (e.g., use bool(...) or both comparisons) so the function always returns
True/False; reference symbols: metadata_has_url_and_title, metadata.get(...),
"reference_url", "title".

requirements.overrides.txt-22-27 (1)

22-27: ⚠️ Potential issue | 🟡 Minor

Duplicate hf-xet==1.2.0 entry.

hf-xet==1.2.0 appears on both line 22 and line 27. The duplicate should be removed to avoid confusion during dependency resolution.
Proposed fix
 pypdfium2==4.30.0
-# Pin to 1.2.0; we fetch as wheel only (PYPI_WHEELS) — sdists need Rust 1.85+.
-hf-xet==1.2.0
If you want to keep the comment explaining why hf-xet is pinned, move it next to the original entry on line 22.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements.overrides.txt` around lines 22 - 27, Remove the duplicate
hf-xet==1.2.0 entry (the second occurrence) from requirements.overrides.txt so
the package appears only once; if you want to preserve the explanatory comment
about pinning ("Pin to 1.2.0; we fetch as wheel only (PYPI_WHEELS) — sdists need
Rust 1.85+."), move that comment next to the remaining hf-xet==1.2.0 line to
keep context and avoid duplicate dependency entries.

README.md-37-37 (1)

37-37: ⚠️ Potential issue | 🟡 Minor

Minor grammar fix.

The sentence structure is slightly awkward.

📝 Proposed fix

-There are prebuilt two images. One with CPU support only (size cca 3.7 GB) and image with CUDA support (size cca 12 GB).
+There are two prebuilt images: one with CPU support only (size ~3.7 GB) and one with CUDA support (size ~12 GB).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 37, Fix the awkward sentence in the README: replace "There
are prebuilt two images. One with CPU support only (size cca 3.7 GB) and image
with CUDA support (size cca 12 GB)." with a single clear sentence such as "There
are two prebuilt images: one with CPU support only (≈3.7 GB) and one with CUDA
support (≈12 GB)." to correct word order and improve readability.

embeddings_model/README.md-107-114 (1)

107-114: ⚠️ Potential issue | 🟡 Minor

Fix typos in documentation.

Several typos were detected by static analysis:

Line 107, 109: "developped" → "developed"
Line 114: "intented" → "intended"
Line 114: "ouptuts" → "outputs"

📝 Proposed fixes

-We developped this model during the 
+We developed this model during the 
 [Community week using JAX/Flax for NLP & CV](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104), 
-organized by Hugging Face. We developped this model as part of the project:
+organized by Hugging Face. We developed this model as part of the project:

-Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures 
+Our model is intended to be used as a sentence and short paragraph encoder. Given an input text, it outputs a vector which captures

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/README.md` around lines 107 - 114, Fix the documentation
typos in embeddings_model/README.md: replace both occurrences of "developped"
with "developed", change "intented" to "intended", and change "ouptuts" to
"outputs" so the paragraph reads correctly (locate the sentence starting "We
developped this model..." and the "## Intended uses" paragraph containing
"intented" and "ouptuts").

embeddings_model/train_script.py-299-301 (1)

299-301: ⚠️ Potential issue | 🟡 Minor

input() call blocks automated/CI runs.

If the output folder exists, the script prompts for user input. This will hang indefinitely in automated environments (CI/CD, batch jobs).

Consider using a command-line flag like --overwrite or --force instead:

🛠️ Suggested fix

+    parser.add_argument('--force', action='store_true', help="Overwrite existing output directory without prompting")
     ...
     if os.path.exists(args.output):
-        print("Output folder already exists.")
-        input("Continue?")
+        if not args.force:
+            print("Output folder already exists. Use --force to overwrite.")
+            sys.exit(1)
+        print("Output folder exists, continuing (--force specified).")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/train_script.py` around lines 299 - 301, The script
currently calls input() when os.path.exists(args.output), which blocks CI;
change this to use a CLI flag (e.g., add --overwrite or --force to the argument
parser) and replace the interactive prompt with a check: if the output path
exists and args.overwrite/args.force is False, exit with a non-zero error (or
raise an exception) and print a clear message; if args.overwrite/args.force is
True, remove or clear the existing output folder (e.g., shutil.rmtree or safe
deletion) before continuing. Update references to args.output and the argument
parser (where args is created) and ensure all code paths that wrote to
args.output respect the new flag.

🧹 Nitpick comments (17)

src/lightspeed_rag_content/okp.py (1)

21-22: Prefer explicit metadata validation over unchecked nested access.

The current access pattern still trusts metadata shape at runtime. A small helper would make failures explicit and easier to debug.

♻️ Suggested refactor

+def _require_metadata_str(md: dict[str, Any], *path: str) -> str:
+    current: Any = md
+    for key in path:
+        if not isinstance(current, dict) or key not in current:
+            raise ValueError(f"Missing metadata key: {'.'.join(path)}")
+        current = current[key]
+    if not isinstance(current, str) or not current.strip():
+        raise ValueError(f"Invalid metadata value: {'.'.join(path)}")
+    return current
+
 class OKPMetadataProcessor(MetadataProcessor):
@@
     def url_function(self, file_path: str) -> str:
         """Return the URL for the OKP file."""
         md = parse_metadata(file_path)
-        return cast("str", md["extra"]["reference_url"])
+        return _require_metadata_str(md, "extra", "reference_url")
@@
     def get_file_title(self, file_path: str) -> str:
         """Return the title of the OKP file."""
         md = parse_metadata(file_path)
-        return cast("str", md["title"])
+        return _require_metadata_str(md, "title")

Also applies to: 144-150

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lightspeed_rag_content/okp.py` around lines 21 - 22, The code performs
unchecked nested access into the metadata structure; add a small helper (e.g.,
get_required_metadata_field(metadata, key, expected_type) or
validate_metadata_shape(metadata, schema)) and use it wherever metadata is read
(replace direct metadata[...] or nested casts) so the function explicitly
validates presence and types and raises clear errors; update all locations that
access metadata (including the block around lines where metadata is currently
read) to call this helper before using values.

embeddings_model/.gitattributes (1)

28-28: Use wildcard pattern for safetensors files to match the existing convention.

Line 28 uses model.safetensors (exact filename match) while all other patterns in this file use wildcards (e.g., *.bin, *.pt, *.pth). Using *.safetensors ensures any future safetensors files are automatically tracked by LFS and keeps the configuration consistent.
♻️ Proposed update
-model.safetensors filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/.gitattributes` at line 28, Replace the exact filename
pattern "model.safetensors" with a wildcard pattern "*.safetensors" in the
.gitattributes entry so the LFS attributes (filter=lfs diff=lfs merge=lfs -text)
apply to all safetensors files; update the line that currently reads
model.safetensors filter=lfs diff=lfs merge=lfs -text to use "*.safetensors" to
match the project's existing wildcard conventions.

scripts/list_wheel_requires_dist.py (2)

44-51: No ZIP64 support for very large wheel files.

The script uses 32-bit offsets from the standard EOCD record (signature PK\x05\x06). ZIP64 archives use a different EOCD64 locator and record format. While most Python wheels are unlikely to exceed 4GB, this is a limitation to be aware of.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/list_wheel_requires_dist.py` around lines 44 - 51, The current
extraction reads EOCD at eocd_i and uses 32-bit fields (cd_off, cd_size) which
fails for ZIP64 archives; update the logic around eocd_i/tail to detect a ZIP64
EOCD locator (signature PK\x06\x07) and, when present, parse the ZIP64 EOCD
locator and ZIP64 EOCD record to obtain 64-bit cd_off and cd_size before calling
_http_range and iterating the central directory; keep the existing 32-bit path
as a fallback when no ZIP64 locator is found and ensure the variables cd_off and
cd_size are 64-bit-capable so the while loop and struct.unpack_from checks
(signature checks for 0x02014B50) work with large archives.
43-47: Hardcoded 70KB tail fetch may fail for wheels with large central directories.

The script fetches the last 70,000 bytes to find the EOCD (End of Central Directory) record. For wheels with many files or long filenames, the central directory could exceed this size, causing the EOCD search to fail.

Consider making this configurable or implementing a fallback that fetches more data if EOCD is not found.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/list_wheel_requires_dist.py` around lines 43 - 47, The fixed code
should stop assuming a 70KB tail always contains the EOCD: replace the hardcoded
tail = _http_range(final, total - 70_000, total - 1) logic with a loop/resize
strategy that calls _http_range progressively larger (e.g., start with 70_000
then double) until tail.rfind(b"PK\x05\x06") returns a valid index or you’ve
fetched the entire file; use the existing symbols (tail, eocd_i, cd_off,
cd_size, _http_range) to locate the logic, ensure cd_off/cd_size unpacking only
runs after a valid eocd_i, and if EOCD is still not found raise a clear error
explaining the attempted sizes (or make the initial tail size configurable via
an argument or constant).

embeddings_model/1_Pooling/config.json (1)

1-7: Consider adding trailing newline.

The configuration is correct for mean-pooling sentence embeddings with 768-dimensional BERT-base vectors. Minor note: the file is missing a trailing newline, which is a POSIX convention and can cause issues with some tools.
📝 Add trailing newline
 {
   "word_embedding_dimension": 768,
   "pooling_mode_cls_token": false,
   "pooling_mode_mean_tokens": true,
   "pooling_mode_max_tokens": false,
   "pooling_mode_mean_sqrt_len_tokens": false
-}
+}
(Ensure file ends with a newline character)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/1_Pooling/config.json` around lines 1 - 7, Add a POSIX
trailing newline to the JSON config file so tools that expect a final newline
won't fail; update the file containing keys like "word_embedding_dimension",
"pooling_mode_mean_tokens", etc., ensuring the final character is a newline
(i.e., the file ends with '\n') without changing any existing JSON content or
formatting.

Containerfile (1)

50-55: Fail on unsupported TARGETARCH values.

The *) CPU_WHEEL_ARCH=x86_64 fallback hides platform mismatches and can install the wrong wheel set. It is safer to accept the known aliases explicitly and exit on anything else.

Safer mapping

-    case "${TARGETARCH:-amd64}" in amd64) CPU_WHEEL_ARCH=x86_64 ;; arm64) CPU_WHEEL_ARCH=aarch64 ;; *) CPU_WHEEL_ARCH=x86_64 ;; esac && \
+    case "${TARGETARCH:-amd64}" in \
+      amd64|x86_64) CPU_WHEEL_ARCH=x86_64 ;; \
+      arm64|aarch64) CPU_WHEEL_ARCH=aarch64 ;; \
+      *) echo "Unsupported TARGETARCH: ${TARGETARCH}" >&2; exit 1 ;; \
+    esac && \

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Containerfile` around lines 50 - 55, The case that sets CPU_WHEEL_ARCH from
TARGETARCH should stop silently defaulting to x86_64; update the case around
TARGETARCH/Cpu mapping so it only accepts explicit known aliases (e.g., handle
amd64 and x86_64 -> x86_64, arm64 and aarch64 -> aarch64) and on any other value
print an error and exit non‑zero instead of falling through to the x86_64
fallback; adjust the block that currently sets CPU_WHEEL_ARCH and subsequently
runs pip install to rely on that strict mapping.

pyproject.toml (1)

1-5: Don't blanket-exclude the whole embeddings_model tree if it contains maintained Python code.

These settings suppress Ruff/Mypy/Pyright/Black for everything under embeddings_model, not just generated artifacts. If the regeneration script still lives there, narrow the excludes to the JSON/model assets so that executable code stays covered.

Also applies to: 20-35

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 1 - 5, The current pyproject.toml extend-exclude
is excluding the entire embeddings_model tree; narrow that exclusion to only
non-code generated assets (e.g., JSON/model files, binaries, and a dedicated
generated/assets subdir) instead of the whole package so maintained Python files
remain linted and type-checked. Update the [tool.black] extend-exclude pattern
(and any equivalent exclude/ignore settings under [tool.ruff], [tool.mypy],
[tool.pyright] or similar) to explicitly target file extensions or a generated
subdirectory (examples: embeddings_model/**/*.json, embeddings_model/**/*.bin,
embeddings_model/generated/**) and remove the blanket embeddings_model exclude
so executable modules like embeddings_model/*.py are still checked. Ensure
patterns match the regeneration script’s output paths and run the linters/type
checkers to verify only generated artifacts are skipped.

embeddings_model/README.md (1)

42-44: Add language specifier to fenced code block.

The code block should specify the language for proper syntax highlighting.

📝 Proposed fix

-```
+```shell
 pip install -U sentence-transformers

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @embeddings_model/README.md around lines 42 - 44, Update the fenced code
block in embeddings_model/README.md that contains "pip install -U
sentence-transformers" to include a language specifier for proper highlighting
(e.g., add "shell" or "bash" after the opening ), so the block becomes shell ... ```; locate the block by matching its exact content "pip install -U
sentence-transformers" and add the language tag to the opening fence.


</details>

</blockquote></details>
<details>
<summary>Containerfile-cuda (2)</summary><blockquote>

`50-52`: **Fragile pip check error handling.**

The `pip check` error handling relies on grep patterns to filter output. This approach is fragile because:
1. If pip's output format changes, the grep may fail silently
2. The logic `grep -v hf-xet | grep -q .` could mask legitimate errors if they happen to not match

Consider a more robust approach or at minimum add a comment explaining why `hf-xet` errors are expected and safe to ignore.



<details>
<summary>♻️ Suggested improvement</summary>

```diff
-    ( pip check >/tmp/pipcheck.txt 2>&1; rc=$?; cat /tmp/pipcheck.txt; if [ "$rc" -ne 0 ]; then grep "which is not installed" /tmp/pipcheck.txt | grep -v hf-xet | grep -q . && exit 1; fi; exit 0 ); \
+    # pip check may report hf-xet as missing - this is expected because hf-xet is an optional
+    # dependency of huggingface-hub that requires a Rust toolchain not available in hermetic builds.
+    # We filter out hf-xet warnings but fail on any other missing dependency.
+    ( pip check >/tmp/pipcheck.txt 2>&1; rc=$?; cat /tmp/pipcheck.txt; \
+      if [ "$rc" -ne 0 ]; then \
+        grep "which is not installed" /tmp/pipcheck.txt | grep -v hf-xet | grep -q . && exit 1; \
+      fi; exit 0 ); \
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@Containerfile-cuda` around lines 50 - 52, The pip check block that writes to
/tmp/pipcheck.txt and then filters via grep (the command starting with "pip
check >/tmp/pipcheck.txt 2>&1; rc=$?; cat /tmp/pipcheck.txt; if [ \"$rc\" -ne 0
]; then grep \"which is not installed\" /tmp/pipcheck.txt | grep -v hf-xet |
grep -q . && exit 1; fi; exit 0") is fragile; replace the brittle grep pipeline
with a more robust check that (1) captures pip check output to a temp file, (2)
searches explicitly for lines containing "which is not installed", (3) excludes
known-acceptable package(s) like "hf-xet" using a clear whitelist/allowlist, and
(4) exits non-zero if any remaining offending lines exist; also add an inline
comment explaining why "hf-xet" is expected/ignored so future maintainers
understand the exception.
```

</details>

---

`54-57`: **Non-hermetic flow removes uv.lock and .venv twice.**

In the non-hermetic else branch, `.venv` is created by `uv venv`, then removed with `rm -rf .venv` at line 56, and recreated by `uv sync`. While functional, this is redundant.



<details>
<summary>♻️ Suggested simplification</summary>

```diff
     else \
-    uv venv && uv pip install tomlkit && \
-    uv run python ./scripts/remove_pytorch_cpu_pyproject.py pyproject.toml && \
-    rm -f uv.lock && rm -rf .venv && uv lock && uv sync --locked --no-install-project && \
+    uv pip install --system tomlkit && \
+    python3.12 ./scripts/remove_pytorch_cpu_pyproject.py pyproject.toml && \
+    rm -f uv.lock && uv lock && uv sync --locked --no-install-project && \
     uv sync --locked; \
     fi
```

This avoids creating a temporary venv just to run the script.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@Containerfile-cuda` around lines 54 - 57, The Dockerfile creates and then
removes a temporary virtualenv (calls to "uv venv" and later "rm -rf .venv")
just to run the remove_pytorch_cpu_pyproject script, which is redundant; remove
the "uv venv" and the subsequent "rm -rf .venv" steps and instead run the script
without creating a venv (e.g., invoke the script with the system python or
ensure tomlkit is available via a simple pip install prior to running
"./scripts/remove_pytorch_cpu_pyproject.py"), keeping the existing "uv pip
install tomlkit", "uv lock", and "uv sync" sequence (references: uv venv, uv pip
install tomlkit, uv run python ./scripts/remove_pytorch_cpu_pyproject.py, rm -rf
.venv, uv lock, uv sync).
```

</details>

</blockquote></details>
<details>
<summary>embeddings_model/train_script.py (4)</summary><blockquote>

`18-21`: **Duplicate import of `torch`.**

`torch` is imported twice (lines 18 and 21). Remove the duplicate.

<details>
<summary>🧹 Suggested fix</summary>

```diff
 import tqdm
 import torch
 from torch import nn
 from torch.utils.data import DataLoader
-import torch
 import torch_xla
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/train_script.py` around lines 18 - 21, The file currently
imports torch twice; remove the duplicate import statement so there's only one
"import torch" alongside the existing "from torch import nn" and "from
torch.utils.data import DataLoader" to avoid redundancy and lint errors—update
the import block in embeddings_model/train_script.py accordingly.
```

</details>

---

`338-339`: **Use `p.terminate()` and proper exit code.**

`p.kill()` sends SIGKILL which doesn't allow the producer process to clean up. Use `p.terminate()` (SIGTERM) for graceful shutdown. Also, prefer `sys.exit(0)` over bare `exit()` for explicit success indication.

<details>
<summary>🧹 Suggested fix</summary>

```diff
-    p.kill()
-    exit()
+    p.terminate()
+    p.join(timeout=5)
+    if p.is_alive():
+        p.kill()
+    sys.exit(0)
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/train_script.py` around lines 338 - 339, Replace the abrupt
hard-kill and bare exit by calling p.terminate() to send SIGTERM for graceful
shutdown of the producer process and then call sys.exit(0) to indicate
successful termination; update the code paths using p.kill() and exit() in
train_script.py (the block that currently calls p.kill() and exit()) to call
p.terminate() and sys.exit(0) instead, and ensure you import sys if not already
imported.
```

</details>

---

`246-276`: **Confusing iteration logic; large datasets never shuffle.**

The `Dataset.__iter__` logic is hard to follow:
- When `len(dataset) >= max_dataset_size`, `dataset` is set to `None` (line 267)
- The outer `while` loop (line 251) then re-reads the file from scratch on the next epoch
- Lines 273-276 (the shuffle loop) are only reached for small datasets

This means large datasets (>10M entries) are always read in file order and never shuffled, which may hurt training quality. Consider documenting this behavior or implementing streaming shuffle for large datasets.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/train_script.py` around lines 246 - 276, The iteration
currently makes large datasets (>max_dataset_size) never shuffled because
Dataset.__iter__ sets dataset=None and re-reads the file each epoch; fix by
implementing a streaming (buffered) shuffle inside __iter__: use dataset as a
fixed-size buffer (size = max_dataset_size), fill it from
gzip.open(self.filepath) until full, then for each subsequent item pick a random
index i in the buffer, yield buffer[i] and replace buffer[i] with the new item;
after EOF shuffle the remaining buffer and yield all remaining entries; keep
variable names dataset, max_dataset_size, filepath and update the loop logic in
Dataset.__iter__ accordingly so both small and large datasets get shuffled
without loading everything into memory.
```

</details>

---

`97-97`: **`queue.get()` without timeout may block indefinitely.**

If the producer process crashes or terminates unexpectedly, `queue.get()` will block forever. Consider adding a timeout and handling the `Empty` exception to fail gracefully.

<details>
<summary>🛡️ Suggested fix</summary>

```diff
+from queue import Empty
+
 def train_function(index, args, queue):
     ...
     for global_step in tqdm.trange(args.steps, disable=not xm.is_master_ordinal()):
         #### Get the batch data
-        batch = queue.get()
+        try:
+            batch = queue.get(timeout=300)  # 5 minute timeout
+        except Empty:
+            xm.master_print("Queue timeout - producer may have died")
+            raise RuntimeError("Data producer timeout")
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@embeddings_model/train_script.py` at line 97, Replace the blocking call to
batch = queue.get() with a timed get and proper exception handling: call
queue.get(timeout=...) (e.g., a few seconds), catch the Empty exception
(queue.Empty or multiprocessing.queues.Empty depending on how queue is
imported), and then handle it gracefully (log the timeout, break/return or
continue to retry, and ensure any worker/process shutdown logic runs). Update
any surrounding loop in the function that consumes the queue to use this pattern
and ensure the code imports or references the correct Empty symbol for the queue
implementation used.
```

</details>

</blockquote></details>
<details>
<summary>scripts/konflux_requirements_cuda.sh (3)</summary><blockquote>

`23-23`: **Consider adding `set -o pipefail` for safer pipelines.**

The script uses `set -ex` which is good, but pipelines can hide failures. Adding `pipefail` ensures the script exits if any command in a pipeline fails, not just the last one.

<details>
<summary>🛡️ Suggested improvement</summary>

```diff
-set -ex
+set -euxo pipefail
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@scripts/konflux_requirements_cuda.sh` at line 23, The script currently uses
"set -ex" which won't fail if an earlier command in a pipeline fails; update the
script's shell options to include pipefail (e.g., add "set -o pipefail" or use
"set -e -x -o pipefail") so that any failure inside pipelines causes the script
to exit—modify the line that sets shell options (the one containing set -ex) to
include pipefail.
```

</details>

---

`303-319`: **Error message for faiss-cpu goes to stdout, not stderr.**

The error message at line 319 (`echo "Could not get faiss-cpu..."`) is printed to stdout. For consistency with typical shell conventions, errors should go to stderr.

<details>
<summary>🧹 Suggested fix</summary>

```diff
-" "$faiss_version") || { echo "Could not get faiss-cpu wheel URLs (x86_64 and aarch64) from PyPI for version $faiss_version"; exit 1; }
+" "$faiss_version") || { echo "Could not get faiss-cpu wheel URLs (x86_64 and aarch64) from PyPI for version $faiss_version" >&2; exit 1; }
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@scripts/konflux_requirements_cuda.sh` around lines 303 - 319, The error
message printed when the Python block fails currently goes to stdout; update the
failure handler for the FAISS_CPU_SPEC assignment so the echo in the brace block
writes to stderr (e.g., change echo "Could not get faiss-cpu wheel URLs (x86_64
and aarch64) from PyPI for version $faiss_version" to echo "..." >&2 or use
printf "...\\n" >&2) and keep the subsequent exit 1 unchanged; this targets the
failure branch that follows the FAISS_CPU_SPEC assignment and ensures errors are
emitted to stderr.
```

</details>

---

`343-349`: **Multiple hardcoded wheel URLs and hashes require manual maintenance.**

The script contains hardcoded URLs and SHA256 hashes for torch, torchvision, triton, antlr4, pylatexenc, and deprecated packages. When RHOAI updates these wheels, the hashes will need manual updates.

Consider documenting the update process (e.g., how to find new hashes from RHOAI pulp) or adding a helper script to fetch current hashes automatically.




Also applies to: 469-482

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@scripts/konflux_requirements_cuda.sh` around lines 343 - 349, The script
currently hardcodes wheel URLs and SHA256 hashes (e.g., AARCH64_TORCH_URL,
AARCH64_TORCH_HASH, AARCH64_TORCHVISION_URL, AARCH64_TORCHVISION_HASH,
AARCH64_TRITON_URL, AARCH64_TRITON_HASH and the other entries later in the
file), forcing manual updates whenever RHOAI pulp content changes; replace this
by either (A) moving all wheel URLs and hashes into a single external manifest
(JSON/YAML) and load it at runtime from scripts/konflux_wheels_manifest.*, or
(B) adding a small helper function/script (e.g., fetch_rhoai_wheel_metadata())
that queries the RHOAI pulp API under RHOAI_PULP / RHOAI_PULP_CPU to discover
the latest wheel URL and compute/verify its SHA256 automatically, and then have
the main install logic use those values; also add a short README note describing
how to regenerate the manifest or run the helper so future updates don’t require
manual hash edits.
```

</details>

</blockquote></details>

</blockquote></details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `19154bca-13e6-4f18-bb12-481c5d799570`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 94246bf6fa8f54135c46b3e9a341cc9f7f36bbfe and 7694a7c17c4a41326393659277c9f89fa4d1fd21.

</details>

<details>
<summary>⛔ Files ignored due to path filters (1)</summary>

* `uv.lock` is excluded by `!**/*.lock`

</details>

<details>
<summary>📒 Files selected for processing (82)</summary>

* `.github/workflows/build_and_push_dev.yaml`
* `.github/workflows/build_and_push_dev_cuda.yaml`
* `.github/workflows/pydocstyle.yaml`
* `.pre-commit-config.yaml`
* `.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml`
* `.tekton/lightspeed-core-rag-content-c0ec3-push.yaml`
* `.tekton/lightspeed-core-rag-content-cpu-f176b-pull-request.yaml`
* `.tekton/lightspeed-core-rag-content-cpu-f176b-push.yaml`
* `.tekton/rag-tool-pull-request-cuda.yaml`
* `.tekton/rag-tool-pull-request.yaml`
* `.tekton/rag-tool-push-cuda.yaml`
* `.tekton/rag-tool-push.yaml`
* `Containerfile`
* `Containerfile-cuda`
* `Containerfile-gpu`
* `Makefile`
* `README.md`
* `build-args-konflux-cuda.conf`
* `build-args-konflux.conf`
* `cuda/cuda.repo`
* `cuda/redhat.repo`
* `cuda/rpms.in.yaml`
* `cuda/rpms.lock.yaml`
* `embeddings_model/.gitattributes`
* `embeddings_model/1_Pooling/config.json`
* `embeddings_model/README.md`
* `embeddings_model/config.json`
* `embeddings_model/config_sentence_transformers.json`
* `embeddings_model/data_config.json`
* `embeddings_model/modules.json`
* `embeddings_model/sentence_bert_config.json`
* `embeddings_model/special_tokens_map.json`
* `embeddings_model/tokenizer.json`
* `embeddings_model/tokenizer_config.json`
* `embeddings_model/train_script.py`
* `embeddings_model/version.txt`
* `embeddings_model/vocab.txt`
* `pyproject.toml`
* `renovate.json`
* `requirements-build.cuda.txt`
* `requirements-build.txt`
* `requirements.hashes.source.cuda.txt`
* `requirements.hashes.source.txt`
* `requirements.hashes.wheel.cpu.aarch64.txt`
* `requirements.hashes.wheel.cpu.x86_64.txt`
* `requirements.hashes.wheel.cuda.txt`
* `requirements.hashes.wheel.pypi.cuda.aarch64.txt`
* `requirements.hashes.wheel.pypi.cuda.base.txt`
* `requirements.hashes.wheel.pypi.cuda.x86_64.txt`
* `requirements.hashes.wheel.pypi.txt`
* `requirements.hashes.wheel.txt`
* `requirements.overrides.cuda.txt`
* `requirements.overrides.txt`
* `rpms.in.yaml`
* `rpms.lock.yaml`
* `scripts/distance.py`
* `scripts/generate-rpm-lock.sh`
* `scripts/generate_packages_to_prefetch.py`
* `scripts/konflux_requirements.sh`
* `scripts/konflux_requirements_cuda.sh`
* `scripts/list_wheel_requires_dist.py`
* `scripts/query_rag.py`
* `scripts/remove_pytorch_cpu_pyproject.py`
* `scripts/verify_aarch64_hashes.py`
* `src/lightspeed_rag_content/asciidoc/__main__.py`
* `src/lightspeed_rag_content/document_processor.py`
* `src/lightspeed_rag_content/html/__main__.py`
* `src/lightspeed_rag_content/html/html_reader.py`
* `src/lightspeed_rag_content/metadata_processor.py`
* `src/lightspeed_rag_content/okp.py`
* `src/lightspeed_rag_content/utils.py`
* `tests/asciidoc/test__main__.py`
* `tests/asciidoc/test_asciidoc_converter.py`
* `tests/html_tests/__init__.py`
* `tests/html_tests/test__main__.py`
* `tests/html_tests/test_html_reader.py`
* `tests/test_document_processor.py`
* `tests/test_document_processor_llama_index.py`
* `tests/test_document_processor_llama_stack.py`
* `tests/test_metadata_processor.py`
* `tests/test_utils.py`
* `ubi.repo`

</details>

<details>
<summary>💤 Files with no reviewable changes (2)</summary>

* Containerfile-gpu
* ubi.repo

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

coderabbitai · 2026-03-24T02:20:38Z

.github/workflows/build_and_push_dev_cuda.yaml

  push:
-    branches: [ main ]
+    branches: [ syedriko-lcore-1440-cuda ]


⚠️ Potential issue | 🔴 Critical

Branch target must be changed back to main before merge.

The workflow is currently set to trigger on pushes to syedriko-lcore-1440-cuda instead of main. This appears to be a testing configuration that should not be merged to production.

🐛 Proposed fix

on: push: - branches: [ syedriko-lcore-1440-cuda ] + branches: [ main ]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

push:

branches: [ main ]

branches: [ syedriko-lcore-1440-cuda ]

push:

branches: [ main ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/build_and_push_dev_cuda.yaml around lines 4 - 5, The workflow's push trigger currently targets the feature branch name in the 'push' -> 'branches' entry; change that branches array value from "syedriko-lcore-1440-cuda" back to "main" so the workflow runs on pushes to the production branch rather than the feature branch (update the branches entry used by the push trigger in the YAML).

coderabbitai · 2026-03-24T02:20:38Z

embeddings_model/train_script.py

+    model = model.to(device)
+
+    # Instantiate optimizer
+    optimizer = AdamW(params=model.parameters(), lr=2e-5, correct_bias=True)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

transformers AdamW correct_bias deprecated

💡 Result:

In the Hugging Face Transformers library, the custom AdamW optimizer (from transformers.optimization.AdamW) included a correct_bias parameter (default: True). This controlled whether to apply bias correction in the Adam update (dividing by bias_correction1 and sqrt(bias_correction2)). It was set to False in some cases like BERT TensorFlow to match original implementations. This custom AdamW has been deprecated since around v4.20+ (warnings started ~2022-2023), recommending torch.optim.AdamW instead, or no_deprecation_warning=True to suppress. PyTorch's torch.optim.AdamW always applies bias correction (no correct_bias parameter; equivalent to correct_bias=True). As of March 2025 (PR #36177 merged), Transformers removed its custom AdamW implementation entirely, now re-exporting/importing torch.optim.AdamW directly for backward compatibility. Thus, correct_bias no longer exists; use torch.optim.AdamW which has fixed bias correction. To match old correct_bias=False behavior, implement custom logic or use another optimizer, but PyTorch AdamW is recommended and standard now. Trainer defaults to torch.optim.AdamW via optim="adamw_torch" (previously adamw_hf for custom).

Citations:

1: https://discuss.huggingface.co/t/futurewarning-this-implementation-of-adamw-is-deprecated-and-will-be-removed-in-a-future-version-use-the-pytorch-implementation-torch-optim-adamw-instead/32283

2: AdamW in HuggingFace is different from AdamW in Pytorch huggingface/transformers#3407

3: https://huggingface.co/transformers/v3.0.2/main_classes/optimizer_schedules.html

4: https://huggingface.co/docs/transformers/v4.26.1/main_classes/optimizer_schedules

5: Two bugs in AdamW huggingface/transformers#14539

6: Subtle difference with Pytorch AdamW? huggingface/transformers#35504

7: Remove our AdamW implementation huggingface/transformers#36177

8: Deprecates AdamW and adds --optim huggingface/transformers#14744

9: https://huggingface.co/transformers/v4.5.1/main_classes/optimizer_schedules.html

10: AdamW implementation huggingface/transformers#22263

11: https://docs.pytorch.org/docs/stable/generated/torch.optim.AdamW.html

12: https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html

13: https://pytorch.org/docs/1.4.0/_modules/torch/optim/adamw.html

14: https://github.com/pytorch/pytorch/blob/main/torch/optim/adamw.py

🏁 Script executed:

head -50 embeddings_model/train_script.py | grep -E "^import|^from"

Repository: lightspeed-core/rag-content

Length of output: 594

🏁 Script executed:

sed -n '23,50p' embeddings_model/train_script.py

Repository: lightspeed-core/rag-content

Length of output: 895

🏁 Script executed:

find . -maxdepth 2 -type f $ -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" $ | head -10

Repository: lightspeed-core/rag-content

Length of output: 434

🏁 Script executed:

cat requirements.txt 2>/dev/null || cat pyproject.toml | grep -A 30 "dependencies\|transformers"

Repository: lightspeed-core/rag-content

Length of output: 1064

🏁 Script executed:

python3 << 'EOF' import torch import inspect # Check PyTorch AdamW signature sig = inspect.signature(torch.optim.AdamW.__init__) print("torch.optim.AdamW parameters:") for param_name, param in sig.parameters.items(): print(f" {param_name}: {param.annotation if param.annotation != inspect.Parameter.empty else 'no annotation'}") # Try to instantiate with correct_bias to see the error try: dummy_param = torch.nn.Parameter(torch.tensor([1.0])) optimizer = torch.optim.AdamW([dummy_param], lr=2e-5, correct_bias=True) except TypeError as e: print(f"\nError when using correct_bias=True:") print(f" {e}") EOF

Repository: lightspeed-core/rag-content

Length of output: 188

🏁 Script executed:

sed -n '75,85p' embeddings_model/train_script.py

Repository: lightspeed-core/rag-content

Length of output: 368

Remove correct_bias=True parameter from AdamW instantiation—it no longer exists.

The correct_bias parameter has been removed from transformers.AdamW as of March 2025. Transformers now re-exports torch.optim.AdamW directly, which does not support this parameter. Remove correct_bias=True from line 81; PyTorch's AdamW always applies bias correction by default.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@embeddings_model/train_script.py` at line 81, The AdamW instantiation uses the removed parameter correct_bias=True; update the optimizer creation by removing the correct_bias argument so the line assigning optimizer (optimizer = AdamW(params=model.parameters(), lr=2e-5, ...)) uses the re-exported torch.optim.AdamW signature without correct_bias. Ensure you keep params=model.parameters() and lr=2e-5 unchanged and only drop the incorrect parameter to fix compatibility.

coderabbitai · 2026-03-24T02:20:38Z

requirements-build.cuda.txt

+beniget==0.4.2.post1
+    # via pythran
+calver==2025.10.20
+    # via trove-classifiers
+cffi==2.0.0
+    # via cryptography
+coherent-licensed==0.5.2
+    # via
+    #   importlib-metadata
+    #   zipp
+cython==3.1.8
+    # via scipy
+cython==3.2.4
+    # via
+    #   asyncpg
+    #   lxml
+    #   numpy
+    #   oracledb
+    #   scikit-learn
+    #   shapely
+    #   sqlalchemy


⚠️ Potential issue | 🔴 Critical

Critical: Duplicate package declarations with conflicting versions.

The file contains multiple package entries with different version pins:

cython appears at both 3.1.8 (line 7) and 3.2.4 (line 19)

Similar duplicates exist for hatchling, meson-python, numpy, setuptools-scm, and setuptools

When pip processes this file, the behavior is undefined—only one version will be installed, but which one is non-deterministic. This breaks build reproducibility and may cause dependency conflicts.

🔍 Recommended fix: Resolve conflicting constraints

Re-generate this lock file with pybuild-deps compile after resolving the upstream dependency conflicts, or manually consolidate to a single version per package that satisfies all downstream requirements.

For example:

-cython==3.1.8 - # via scipy -cython==3.2.4 +cython==3.2.4 # via + # scipy # asyncpg # lxml

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

beniget==0.4.2.post1

# via pythran

calver==2025.10.20

# via trove-classifiers

cffi==2.0.0

# via cryptography

coherent-licensed==0.5.2

# via

# importlib-metadata

# zipp

cython==3.1.8

# via scipy

cython==3.2.4

# via

# asyncpg

# lxml

# numpy

# oracledb

# scikit-learn

# shapely

# sqlalchemy

beniget==0.4.2.post1

# via pythran

calver==2025.10.20

# via trove-classifiers

cffi==2.0.0

# via cryptography

coherent-licensed==0.5.2

# via

# importlib-metadata

# zipp

cython==3.2.4

# via

# scipy

# asyncpg

# lxml

# numpy

# oracledb

# scikit-learn

# shapely

# sqlalchemy

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@requirements-build.cuda.txt` around lines 7 - 27, The requirements file contains duplicate package pins with conflicting versions (e.g., cython 3.1.8 vs 3.2.4, and similar duplicates for hatchling, meson-python, numpy, setuptools-scm, setuptools) which leads to non-deterministic installs; resolve by consolidating each package to a single, compatible version that satisfies all downstream constraints (pick the correct version for cython, hatchling, meson-python, numpy, setuptools-scm, setuptools), update the requirements-build.cuda.txt to remove duplicate entries, and then regenerate the lockfile by running the canonical command (e.g., pybuild-deps compile) so the final file contains one deterministic pin per package.

coderabbitai · 2026-03-24T02:20:38Z

requirements-build.cuda.txt

+hatchling==1.26.3
+    # via
+    #   hatch-fancy-pypi-readme
+    #   llama-cloud
+    #   llama-stack-client
+    #   openai
+hatchling==1.29.0
+    # via
+    #   annotated-types
+    #   attrs
+    #   banks
+    #   chardet
+    #   colorama
+    #   einops
+    #   expandvars
+    #   filelock
+    #   fsspec
+    #   hatch-fancy-pypi-readme
+    #   hatch-vcs
+    #   httpcore
+    #   jsonschema-specifications
+    #   latex2mathml
+    #   llama-index
+    #   llama-index-cli
+    #   llama-index-core
+    #   llama-index-embeddings-huggingface
+    #   llama-index-embeddings-openai
+    #   llama-index-indices-managed-llama-cloud
+    #   llama-index-instrumentation
+    #   llama-index-llms-openai
+    #   llama-index-readers-file
+    #   llama-index-readers-llama-parse
+    #   llama-index-vector-stores-postgres
+    #   mcp
+    #   opentelemetry-api
+    #   opentelemetry-exporter-otlp-proto-common
+    #   opentelemetry-exporter-otlp-proto-http
+    #   opentelemetry-proto
+    #   opentelemetry-sdk
+    #   opentelemetry-semantic-conventions
+    #   platformdirs
+    #   polyfactory
+    #   pydantic-settings
+    #   pygments
+    #   python-multipart
+    #   referencing
+    #   scikit-build-core
+    #   semchunk
+    #   soupsieve
+    #   starlette
+    #   typing-inspection
+    #   uv-dynamic-versioning


⚠️ Potential issue | 🔴 Critical

Critical: Additional duplicate version declarations.

hatchling is pinned to both 1.26.3 (line 71) and 1.29.0 (line 77), creating the same non-deterministic install behavior as flagged above.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@requirements-build.cuda.txt` around lines 71 - 122, The requirements file contains duplicate pins for the same package "hatchling" (1.26.3 and 1.29.0), causing non-deterministic installs; remove the older/undesired duplicate entry so only a single pinned version of hatchling remains (choose and keep the correct version, e.g., 1.29.0) and regenerate or lint the requirements to ensure no other duplicate package pins remain.

coderabbitai · 2026-03-24T02:20:38Z

requirements-build.cuda.txt

+meson-python==0.18.0
+    # via scikit-learn
+meson-python==0.19.0
+    # via
+    #   numpy
+    #   scipy
+meson==1.10.2
+    # via meson-python
+numpy==2.3.5
+    # via
+    #   scikit-learn
+    #   scipy
+numpy==2.4.3
+    # via
+    #   pythran
+    #   scipy


⚠️ Potential issue | 🔴 Critical

Critical: Duplicate meson-python and numpy version pins.

Both meson-python (0.18.0 and 0.19.0) and numpy (2.3.5 and 2.4.3) have conflicting declarations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@requirements-build.cuda.txt` around lines 139 - 154, The requirements file contains conflicting duplicate pins for meson-python (0.18.0 and 0.19.0) and numpy (2.3.5 and 2.4.3); resolve by choosing a single consistent version for each package (or a compatible version range) and remove the duplicate entries so only one meson-python and one numpy pin remains. Inspect dependency sources that list the pins (comments referencing scikit-learn, numpy, scipy, pythran) to pick the version that satisfies all consumers, update the requirements list to the chosen meson-python and numpy version, and ensure any transitive comments are adjusted or documented. Make the change where meson-python and numpy are declared so CI/install uses the single resolved version.

requirements-build.cuda.txt

coderabbitai · 2026-03-24T02:20:38Z

requirements-build.cuda.txt

+setuptools==82.0.0
+    # via charset-normalizer
+setuptools==82.0.1
+    # via
+    #   anyio
+    #   asyncpg
+    #   calver
+    #   certifi
+    #   cffi
+    #   colorlog
+    #   cryptography
+    #   dill
+    #   fire
+    #   googleapis-common-protos
+    #   greenlet
+    #   httpx-sse
+    #   importlib-metadata
+    #   joblib
+    #   llama-stack
+    #   llama-stack-api
+    #   lxml
+    #   markupsafe
+    #   maturin
+    #   meson
+    #   mpmath
+    #   multidict
+    #   multiprocess
+    #   nest-asyncio
+    #   oci
+    #   oracledb
+    #   pathspec
+    #   pgvector
+    #   pluggy
+    #   psutil
+    #   pycparser
+    #   pyjwt
+    #   python-dateutil
+    #   python-docx
+    #   python-dotenv
+    #   python-pptx
+    #   pythran
+    #   regex
+    #   rtree
+    #   sentence-transformers
+    #   setuptools-rust
+    #   setuptools-scm
+    #   shapely
+    #   shellingham
+    #   sqlalchemy
+    #   sse-starlette
+    #   tabulate
+    #   tenacity
+    #   tornado
+    #   tqdm
+    #   tree-sitter
+    #   tree-sitter-c
+    #   tree-sitter-javascript
+    #   tree-sitter-python
+    #   trove-classifiers
+    #   tzdata
+    #   wrapt
+    #   yarl
+    #   zipp


⚠️ Potential issue | 🔴 Critical

Critical: Duplicate setuptools version pins in the "unsafe" section.

setuptools appears at both 82.0.0 (line 255) and 82.0.1 (line 257).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@requirements-build.cuda.txt` around lines 255 - 317, The requirements list contains duplicate setuptools pins (setuptools==82.0.0 and setuptools==82.0.1); remove the older/incorrect pin and keep a single definitive version (e.g., setuptools==82.0.1) so only one setuptools entry appears in the "unsafe" section; update the requirements-build.cuda.txt entry for setuptools to the chosen version and, if you have a lock/regeneration step, regenerate the pinned requirements to ensure consistency.

requirements.hashes.wheel.txt

tests/test_metadata_processor.py

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml:
- Around line 234-238: The matrix param blocks use the scalar expansion "-
$(params.build-platforms)" which wraps the array; update each matrix.params
value to use Tekton's whole-array syntax by replacing
"$(params.build-platforms)" with "$(params.build-platforms[*])" for the PLATFORM
matrix and likewise for the image-platform, platform, and image-arch matrix
definitions so they expand the array items correctly.
- Around line 11-12: The pipelinesascode trigger expression currently uses
target_branch == "syedriko-lcore-1440-cuda" which prevents the pipeline from
running for this PR; update the pipelinesascode.tekton.dev/on-cel-expression
value to use target_branch == "main" (replace "syedriko-lcore-1440-cuda" with
"main") so the validation pipeline will run against the repository's main
branch.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml:
- Around line 10-12: Update the CEL filter under
pipelinesascode.tekton.dev/on-cel-expression so it matches pushes to main
instead of the feature branch: replace the hardcoded target_branch ==
"syedriko-lcore-1440-cuda" condition with target_branch == "main" (or a more
generic expression if intended), ensuring the push trigger will fire after this
YAML lands on main; locate the on-cel-expression entry to apply the change.
- Around line 231-235: The matrix blocks for build-images, clair-scan,
ecosystem-cert-preflight-checks, and clamav-scan are using a single-element list
wrapping of the PLATFORM parameter (value: [$(params.build-platforms)]), which
prevents Tekton from unpacking the full array; update each matrix param entry
for name: PLATFORM to use Tekton's whole-array substitution syntax value:
$(params.build-platforms[*]) so the entire build-platforms array is expanded for
the matrix.

In `@requirements.hashes.source.txt`:
- Around line 177-179: The requirements.hashes.source.txt contains an orphaned
top-level pin "circuitbreaker==2.1.3" that is not referenced in
requirements-build.txt, pyproject.toml, or the codebase; remove the
"circuitbreaker==2.1.3" entry (both hashes) from requirements.hashes.source.txt,
ensure you do not touch the transitive build dependency pins like "mcp", "oci",
or "oracledb", and then regenerate or validate your requirements file (and run
the build/tests) to confirm nothing else requires circuitbreaker.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c0ac200f-21bd-48f4-8f06-d9c8c7a8a340

📥 Commits

Reviewing files that changed from the base of the PR and between 7694a7c and 3e020e4.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml
.tekton/lightspeed-core-rag-content-c0ec3-push.yaml
requirements-build.cuda.txt
requirements-build.txt
requirements.hashes.source.cuda.txt
requirements.hashes.source.txt
requirements.hashes.wheel.cuda.txt
requirements.hashes.wheel.pypi.cuda.base.txt
requirements.hashes.wheel.pypi.txt
scripts/download_embeddings_model.py
scripts/generate_embeddings.py
tests/test_generate_embeddings.py
tests/test_metadata_processor.py

✅ Files skipped from review due to trivial changes (7)

tests/test_generate_embeddings.py
scripts/download_embeddings_model.py
scripts/generate_embeddings.py
tests/test_metadata_processor.py
requirements.hashes.wheel.cuda.txt
requirements-build.cuda.txt
requirements.hashes.source.cuda.txt

🚧 Files skipped from review as they are similar to previous changes (1)

requirements-build.txt

coderabbitai · 2026-03-24T02:37:27Z

.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml

+    pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
+      == "syedriko-lcore-1440-cuda"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find the Tekton file in question find . -name "*lightspeed-core-rag-content-c0ec3-pull-request.yaml" -type f

Repository: lightspeed-core/rag-content

Length of output: 132

🏁 Script executed:

# Read the Tekton file to verify the CEL expression at lines 11-12 cat -n ./.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml | head -20

Repository: lightspeed-core/rag-content

Length of output: 1129

🏁 Script executed:

# Check the repository structure and look for any PR-related files or documentation git ls-files | head -20

Repository: lightspeed-core/rag-content

Length of output: 779

🏁 Script executed:

# Check git branches to understand the branch structure git branch -a | head -20

Repository: lightspeed-core/rag-content

Length of output: 168

🏁 Script executed:

# Check if there are other similar Tekton files and understand the pattern ls -la .tekton/ | grep -E "\.yaml$"

Repository: lightspeed-core/rag-content

Length of output: 780

🏁 Script executed:

# Look for references to the branch name in git git log --oneline --all | grep -i "syedriko\|lcore-1440" | head -10

Repository: lightspeed-core/rag-content

Length of output: 268

🏁 Script executed:

# Check the git status and current branch info to understand PR context git status

Repository: lightspeed-core/rag-content

Length of output: 136

🏁 Script executed:

# Look at other similar pull-request files to see if they have different target branches grep -n "target_branch" .tekton/*pull-request*.yaml

Repository: lightspeed-core/rag-content

Length of output: 1136

🏁 Script executed:

# Check the CPU variant's CEL expression to see what branch it targets sed -n '11,12p' .tekton/lightspeed-core-rag-content-cpu-f176b-pull-request.yaml

Repository: lightspeed-core/rag-content

Length of output: 197

🏁 Script executed:

# Check the README or any documentation about the Tekton pipelines cat README.md | head -50

Repository: lightspeed-core/rag-content

Length of output: 1365

🏁 Script executed:

# Check if there are any comments or documentation in the Tekton files themselves head -30 .tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml

Repository: lightspeed-core/rag-content

Length of output: 1300

🏁 Script executed:

# Let's check if there's a pattern in how this repo organizes Tekton files # Check the history of this specific file to understand its intent git log --oneline .tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml | head -5

Repository: lightspeed-core/rag-content

Length of output: 105

🏁 Script executed:

# Check the git history to see if syedriko-lcore-1440-cuda is a long-lived branch git for-each-ref refs/remotes --format='%(refname:short)' | grep -i "syedriko\|lcore"

Repository: lightspeed-core/rag-content

Length of output: 53

Retarget the PR trigger to main.

This PR's base branch is main, so target_branch == "syedriko-lcore-1440-cuda" prevents the validation pipeline from running. The branch does not exist in the repository's remote tracking branches, and other similar CUDA pipeline files correctly target main.

Suggested change

- pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch - == "syedriko-lcore-1440-cuda" + pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch + == "main"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch

== "syedriko-lcore-1440-cuda"

pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch

== "main"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml around lines 11 - 12, The pipelinesascode trigger expression currently uses target_branch == "syedriko-lcore-1440-cuda" which prevents the pipeline from running for this PR; update the pipelinesascode.tekton.dev/on-cel-expression value to use target_branch == "main" (replace "syedriko-lcore-1440-cuda" with "main") so the validation pipeline will run against the repository's main branch.

coderabbitai · 2026-03-24T02:37:27Z

.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml

+    - matrix:
+        params:
+        - name: PLATFORM
+          value:
+          - $(params.build-platforms)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, find and read the file to check the current code at specified lines FILE=".tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml" if [ -f "$FILE" ]; then echo "=== Lines 234-238 ===" sed -n '234,238p' "$FILE" echo "" echo "=== Lines 361-365 ===" sed -n '361,365p' "$FILE" echo "" echo "=== Lines 388-392 ===" sed -n '388,392p' "$FILE" echo "" echo "=== Lines 439-443 ===" sed -n '439,443p' "$FILE" else echo "File not found" fi

Repository: lightspeed-core/rag-content

Length of output: 609

🌐 Web query:

Tekton matrix params whole-array syntax [*] documentation

💡 Result:

The whole-array syntax $(params.bar[]) in Tekton matrix parameters allows passing an entire array parameter as the value for a matrix param, fanning out the PipelineTask for each element in the array. This is a beta feature requiring the enable-api-fields feature flag set to "beta". Example from official documentation: tasks: ... - name: task-4 taskRef: name: task-4 matrix: params: - name: param-one value: $(params.bar[]) # whole array replacement from array param This syntax supports replacements from Pipeline parameters of type array. If the array is empty, the PipelineTask is skipped. It was enabled via PR #6516 (merged April 2023 in v0.48), which added validation after parameter substitution to allow $(params.foo[]). Similar syntax works for task results: $(tasks.task-4.results.whole-array[]). See the official Tekton Matrix documentation for full details: https://tekton.dev/docs/pipelines/matrix/

Citations:

1: https://tekton.dev/docs/pipelines/matrix/

2: Add matrix support for using references to entire PipelineRun array parameters tektoncd/pipeline#6516

3: https://tekton.dev/docs/pipelines/variables/

Use Tekton's whole-array syntax in these matrix blocks.

Tekton documents whole-array matrix.params replacement as value: $(params.build-platforms[*]). The current - $(params.build-platforms) form diverges from that syntax, so please switch all four matrix blocks to the documented expansion (see tekton.dev). This ensures proper expansion of array parameters instead of wrapping them in an additional list level.

Suggested change

matrix: params: - name: PLATFORM - value: - - $(params.build-platforms) + value: $(params.build-platforms[*])

Apply the same edit to the image-platform, platform, and image-arch matrix definitions at lines 361-365, 388-392, and 439-443 respectively.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- matrix:

params:

- name: PLATFORM

value:

- $(params.build-platforms)

- matrix:

params:

- name: PLATFORM

value: $(params.build-platforms[*])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml around lines 234 - 238, The matrix param blocks use the scalar expansion "- $(params.build-platforms)" which wraps the array; update each matrix.params value to use Tekton's whole-array syntax by replacing "$(params.build-platforms)" with "$(params.build-platforms[*])" for the PLATFORM matrix and likewise for the image-platform, platform, and image-arch matrix definitions so they expand the array items correctly.

coderabbitai · 2026-03-24T02:37:27Z

.tekton/lightspeed-core-rag-content-c0ec3-push.yaml

+    pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch
+      == "syedriko-lcore-1440-cuda"
+  creationTimestamp: null


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .tekton/lightspeed-core-rag-content-c0ec3-push.yaml

Repository: lightspeed-core/rag-content

Length of output: 27546

🏁 Script executed:

git branch -a

Repository: lightspeed-core/rag-content

Length of output: 168

🏁 Script executed:

git log --oneline -5

Repository: lightspeed-core/rag-content

Length of output: 105

Retarget the push trigger to main.

This CEL filter only matches pushes to syedriko-lcore-1440-cuda. Once this file lands on main, pushes to main will stop matching and the new CUDA push pipeline will stay inert.

Suggested change

- pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch - == "syedriko-lcore-1440-cuda" + pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch + == "main"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch

== "syedriko-lcore-1440-cuda"

creationTimestamp: null

pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch

== "main"

creationTimestamp: null

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml around lines 10 - 12, Update the CEL filter under pipelinesascode.tekton.dev/on-cel-expression so it matches pushes to main instead of the feature branch: replace the hardcoded target_branch == "syedriko-lcore-1440-cuda" condition with target_branch == "main" (or a more generic expression if intended), ensuring the push trigger will fire after this YAML lands on main; locate the on-cel-expression entry to apply the change.

coderabbitai · 2026-03-24T02:37:27Z

.tekton/lightspeed-core-rag-content-c0ec3-push.yaml

+    - matrix:
+        params:
+        - name: PLATFORM
+          value:
+          - $(params.build-platforms)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .tekton/lightspeed-core-rag-content-c0ec3-push.yaml | sed -n '225,245p'

Repository: lightspeed-core/rag-content

Length of output: 748

🏁 Script executed:

cat -n .tekton/lightspeed-core-rag-content-c0ec3-push.yaml | sed -n '355,365p'

Repository: lightspeed-core/rag-content

Length of output: 382

🏁 Script executed:

cat -n .tekton/lightspeed-core-rag-content-c0ec3-push.yaml | sed -n '382,392p'

Repository: lightspeed-core/rag-content

Length of output: 394

🏁 Script executed:

cat -n .tekton/lightspeed-core-rag-content-c0ec3-push.yaml | sed -n '433,443p'

Repository: lightspeed-core/rag-content

Length of output: 379

🌐 Web query:

Tekton matrix whole-array syntax $(params.build-platforms[*])

💡 Result:

Yes, the syntax $(params.build-platforms[]) is the valid Tekton whole-array substitution syntax for using an entire array parameter as a value in a PipelineTask's matrix.params[].value field. This fans out the task once, passing the full array as a single parameter value to the underlying Task (which must declare the parameter as type: array). Example from official Tekton docs: tasks: ... - name: task-4 taskRef: name: task-4 matrix: params: - name: param-one value: $(params.bar[]) # whole array replacement from array param This requires: - params.build-platforms to be declared as type: array in the Pipeline spec. - The referenced task parameter (e.g. param-one) to be type: array. - Matrix feature flag enabled (beta in recent versions). This support was added via PR #6516 (merged April 2023 in v0.48+), enabling validation after parameter substitution for $(params.foo[]) in matrix contexts. Note: This differs from individual elements like $(params.bar[0]) or static arrays. For results, similar syntax works: $(tasks.prev-task.results.foo[]).

Citations:

1: https://tekton.dev/docs/pipelines/matrix/

2: Add matrix support for using references to entire PipelineRun array parameters tektoncd/pipeline#6516

3: https://tekton.dev/docs/pipelines/variables/

4: https://tekton.dev/docs/pipelines/variables

Update matrix blocks to use Tekton's whole-array substitution syntax.

All four matrix blocks in this file use value: [$(params.build-platforms)], which wraps the parameter in a single-element list. Tekton's documented whole-array syntax is value: $(params.build-platforms[*]) (tekton.dev), which unpacks the full array. Update the matrix definitions at lines 231–235 (build-images), 358–362 (clair-scan), 385–389 (ecosystem-cert-preflight-checks), and 436–440 (clamav-scan):

Suggested change

matrix: params: - name: PLATFORM - value: - - $(params.build-platforms) + value: $(params.build-platforms[*])

Apply to all four matrix blocks.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- matrix:

params:

- name: PLATFORM

value:

- $(params.build-platforms)

- matrix:

params:

- name: PLATFORM

value: $(params.build-platforms[*])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml around lines 231 - 235, The matrix blocks for build-images, clair-scan, ecosystem-cert-preflight-checks, and clamav-scan are using a single-element list wrapping of the PLATFORM parameter (value: [$(params.build-platforms)]), which prevents Tekton from unpacking the full array; update each matrix param entry for name: PLATFORM to use Tekton's whole-array substitution syntax value: $(params.build-platforms[*]) so the entire build-platforms array is expanded for the matrix.

coderabbitai · 2026-03-24T02:37:27Z

requirements.hashes.source.txt

+circuitbreaker==2.1.3 \
+    --hash=sha256:1a4baee510f7bea3c91b194dcce7c07805fe96c4423ed5594b75af438531d084 \
+    --hash=sha256:87ba6a3ed03fdc7032bc175561c2b04d52ade9d5faf94ca2b035fbdc5e6b1dd1


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's check if pyproject.toml exists and examine its structure find . -name "pyproject.toml" -type f | head -5

Repository: lightspeed-core/rag-content

Length of output: 87

🏁 Script executed:

# Check the runtime dependencies declared in pyproject.toml grep -A 100 "dependencies\s*=" pyproject.toml | head -50

Repository: lightspeed-core/rag-content

Length of output: 1761

🏁 Script executed:

# Look at the specific lines in requirements.hashes.source.txt mentioned in the review head -180 requirements.hashes.source.txt | tail -10

Repository: lightspeed-core/rag-content

Length of output: 789

🏁 Script executed:

# Check lines 291-293, 461-463, 485-516 for the other packages sed -n '291,293p' requirements.hashes.source.txt sed -n '461,463p' requirements.hashes.source.txt sed -n '485,516p' requirements.hashes.source.txt

Repository: lightspeed-core/rag-content

Length of output: 3086

🏁 Script executed:

# Look for any # via comments in requirements files to see transitive relationships grep -n "# via" requirements.hashes.source.txt | grep -E "(circuitbreaker|mcp|oci|oracledb)" | head -20

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Check if these packages are in requirements.source.txt (the source file) if [ -f requirements.source.txt ]; then echo "=== Checking requirements.source.txt ===" grep -E "^(circuitbreaker|mcp|oci|oracledb)" requirements.source.txt fi

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Check requirements-build.txt if [ -f requirements-build.txt ]; then echo "=== Checking requirements-build.txt ===" grep -E "^(circuitbreaker|mcp|oci|oracledb)" requirements-build.txt fi

Repository: lightspeed-core/rag-content

Length of output: 110

🏁 Script executed:

# Let's check what requirements files exist ls -la | grep -i requirements

Repository: lightspeed-core/rag-content

Length of output: 1376

🏁 Script executed:

# Check if there are # via comments for these packages in the full file grep -B 1 "^circuitbreaker\|^mcp\|^oci\|^oracledb" requirements.hashes.source.txt

Repository: lightspeed-core/rag-content

Length of output: 482

🏁 Script executed:

# Let's see the structure around circuitbreaker to understand if it's marked as transitive sed -n '170,185p' requirements.hashes.source.txt

Repository: lightspeed-core/rag-content

Length of output: 1230

🏁 Script executed:

# Check for any mention of these packages in the pyproject.toml file more broadly grep -i "circuitbreaker\|mcp\|oci\|oracledb" pyproject.toml

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Let's see if there are any other config files that might define these dependencies find . -maxdepth 2 -name "*.txt" -o -name "*.toml" -o -name "*.cfg" | head -20

Repository: lightspeed-core/rag-content

Length of output: 703

🏁 Script executed:

# Check if these packages might be pulled in by llama-stack or other dependencies # Let's look at what's declared around those packages grep -A 2 -B 2 "llama-stack\|llama-index" requirements.hashes.source.txt | head -40

Repository: lightspeed-core/rag-content

Length of output: 2810

🏁 Script executed:

# Check requirements-build.txt to see if these packages are referenced via "# via" cat requirements-build.txt

Repository: lightspeed-core/rag-content

Length of output: 4241

🏁 Script executed:

# Let's check if llama-stack dependencies might include these packages # First, let's understand the structure - are there any comments after the package lines? sed -n '177,182p' requirements.hashes.source.txt sed -n '288,296p' requirements.hashes.source.txt sed -n '458,466p' requirements.hashes.source.txt sed -n '482,520p' requirements.hashes.source.txt

Repository: lightspeed-core/rag-content

Length of output: 4632

🏁 Script executed:

# Search for any comments in the file that explain where these packages come from grep -i "circuitbreaker\|mcp\|oci\|oracledb" requirements-build.txt requirements-build.cuda.txt 2>/dev/null | head -20

Repository: lightspeed-core/rag-content

Length of output: 629

🏁 Script executed:

# Let's check all lines in the file to see if any of these packages have "# via" comments anywhere grep -n "# via.*$circuitbreaker\|mcp\|oci\|oracledb$" requirements.hashes.source.txt

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Get broader context - maybe these are dependencies of dependencies # Let's look for any Python file that imports or uses these packages grep -r "import circuitbreaker\|import mcp\|import oci\|import oracledb" --include="*.py" . 2>/dev/null | head -20

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Check if llama-stack pulls in any of these as dependencies # First, let's see all the packages that have "# via llama-stack" in the requirements file grep "# via.*llama-stack" requirements.hashes.source.txt | head -30

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Let's see if circuitbreaker has any "# via" comments or comes from anywhere grep -B 2 -A 2 "^circuitbreaker" requirements.hashes.source.txt

Repository: lightspeed-core/rag-content

Length of output: 430

🏁 Script executed:

# Check if there are any "# via" comments for these packages in source requirements file grep "# via.*circuitbreaker\|# via.*mcp\|# via.*oci\|# via.*oracledb" requirements.hashes.source.txt

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Check requirements.source.txt to see if these are there if [ -f requirements.source.txt ]; then echo "=== requirements.source.txt ===" grep -E "^circuitbreaker|^mcp|^oci|^oracledb" requirements.source.txt else find . -name "requirements.source.txt*" | head -5 fi

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Check if these might be dependencies of llama-stack packages # Look for packages that list llama-stack in their # via grep "# via.*llama" requirements.hashes.source.txt | grep -E "circuitbreaker|mcp|oci|oracledb"

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Let's verify: are these packages present in the build file as TOP-LEVEL or only as transitive? echo "=== In requirements-build.txt as top-level ===" grep "^mcp\|^oci\|^oracledb\|^circuitbreaker" requirements-build.txt echo "" echo "=== In requirements-build.txt as indirect (# via) ===" grep -A 1 "^cython\|^hatchling\|^wheel" requirements-build.txt | grep -E "mcp|oci|oracledb|circuitbreaker"

Repository: lightspeed-core/rag-content

Length of output: 172

🏁 Script executed:

# Search across Python source files to see if these are actually used in the repo echo "=== Searching for imports of these packages in Python files ===" find . -name "*.py" -type f | xargs grep -l "circuitbreaker\|from mcp\|import mcp\|from oci\|import oci\|from oracledb\|import oracledb" 2>/dev/null | head -10

Repository: lightspeed-core/rag-content

Length of output: 134

🏁 Script executed:

# Let's check if these packages might be coming from llama-stack itself # by looking at what packages depend on them grep -B 5 "^circuitbreaker\|^mcp\|^oci\|^oracledb" requirements.hashes.source.txt | grep -E "^[a-z].*==" | tail -10

Repository: lightspeed-core/rag-content

Length of output: 222

🏁 Script executed:

# Let's see the requirements.source.txt file to understand what's being directly pinned if [ -f requirements.source.txt ]; then echo "=== Full requirements.source.txt ===" cat requirements.source.txt else find . -name "*.source.txt*" -type f 2>/dev/null fi

Repository: lightspeed-core/rag-content

Length of output: 103

🏁 Script executed:

# Check for any Python files that might use these packages (broader search) find . -path ./embeddings_model -prune -o -name "*.py" -type f -print | xargs grep -l "circuitbreaker\|from mcp\|import mcp\|from oci\|import oci\|from oracledb\|import oracledb" 2>/dev/null || echo "No Python files found using these packages"

Repository: lightspeed-core/rag-content

Length of output: 113

🏁 Script executed:

# Let's understand the build structure - check if there's a requirements.source.txt.build or similar find . -name "requirements.source.txt*" -o -name "*.build" -o -name "*compile*" | head -10

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Let's check if there's a requirements.source.txt file at the root ls -la requirements.source.txt* 2>/dev/null || echo "No requirements.source.txt file found"

Repository: lightspeed-core/rag-content

Length of output: 108

🏁 Script executed:

# Check git history to understand how these packages were added git log --oneline requirements.hashes.source.txt | head -5

Repository: lightspeed-core/rag-content

Length of output: 105

🏁 Script executed:

# Let's look at what's in the git repo regarding these packages git log -p --all -S "circuitbreaker==2.1.3" -- requirements.hashes.source.txt | head -100

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Let's check if there's documentation about how the requirements files are generated find . -name "*.md" -o -name "*.txt" -o -name "*.rst" | xargs grep -l "requirements\|hashcompile" 2>/dev/null | head -5

Repository: lightspeed-core/rag-content

Length of output: 204

🏁 Script executed:

# Check if circuitbreaker is a transitive dependency - trace back who depends on it # by looking at lines around it awk '/^[a-z]/{pkg=$1} /circuitbreaker/{print NR": "prev; print NR": "$0; print NR": "next} {prev=$0}' requirements.hashes.source.txt | head -30

Repository: lightspeed-core/rag-content

Length of output: 297

🏁 Script executed:

# Let's look at more context: what packages are between the last one before circuitbreaker and circuitbreaker itself sed -n '170,185p' requirements.hashes.source.txt | cat -n

Repository: lightspeed-core/rag-content

Length of output: 1342

🏁 Script executed:

# Let's verify the exact dependency chain - check what package might bring in circuitbreaker # by looking at the entire requirements.hashes.source.txt structure python3 << 'EOF' import re with open('requirements.hashes.source.txt', 'r') as f: content = f.read() # Find all top-level packages (lines that don't start with whitespace or #) packages = re.findall(r'^([a-z0-9\-]+)==[0-9\.]+', content, re.MULTILINE) # Check if our packages are in the top-level list flagged = ['circuitbreaker', 'mcp', 'oci', 'oracledb'] for pkg in flagged: if pkg in packages: print(f"✓ {pkg} is a TOP-LEVEL package in requirements.hashes.source.txt") else: print(f"✗ {pkg} is NOT in top-level packages") # Now check if they have # via comments lines = content.split('\n') for i, line in enumerate(lines): for pkg in flagged: if line.startswith(f'{pkg}=='): # Check next lines for # via comments via_comment = None for j in range(i+1, min(i+30, len(lines))): if '# via' in lines[j]: via_comment = lines[j].strip() break elif lines[j].startswith(pkg): break if via_comment: print(f"{pkg}: has comment: {via_comment}") else: print(f"{pkg}: NO # via comment (likely a top-level pin)") EOF

Repository: lightspeed-core/rag-content

Length of output: 542

🏁 Script executed:

# Let's also verify by checking if llama-stack itself pulls these in # Search for what packages might specify these as dependencies on PyPI # For now, check the README for any context grep -i "circuitbreaker\|mcp\|oci\|oracledb" README.md

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Final confirmation: check git to see if these were added intentionally git log --all --source --remotes -S "circuitbreaker" --oneline | head -5

Repository: lightspeed-core/rag-content

Length of output: 157

🏁 Script executed:

# Let's check if circuitbreaker is mentioned anywhere in the build dependencies grep -r "circuitbreaker" . --include="*.txt" --include="*.toml" --include="*.py" 2>/dev/null

Repository: lightspeed-core/rag-content

Length of output: 189

🏁 Script executed:

# Let's also verify by looking at what imports these packages in the build chain # Check if llama-stack or other dependencies actually require these python3 << 'EOF' import re # Read requirements-build.txt to see what's explicitly listed with open('requirements-build.txt', 'r') as f: build_content = f.read() # Check if our packages appear as top-level or only in # via comments packages = ['circuitbreaker', 'mcp', 'oci', 'oracledb'] for pkg in packages: # Check if it's a top-level package in build requirements if re.search(f'^{pkg}==', build_content, re.MULTILINE): print(f"✓ {pkg} is TOP-LEVEL in requirements-build.txt") else: print(f"✗ {pkg} is NOT top-level in requirements-build.txt") # Check if mentioned anywhere if pkg in build_content: print(f" → {pkg} appears in requirements-build.txt (likely in # via comments)") else: print(f" → {pkg} does NOT appear in requirements-build.txt at all") print("\n=== Context from requirements-build.txt ===") for pkg in packages: if pkg in build_content: lines = build_content.split('\n') for i, line in enumerate(lines): if f'# {pkg}' in line or f'# via {pkg}' in line: print(f"{pkg}: Line {i}: {line.strip()}") EOF

Repository: lightspeed-core/rag-content

Length of output: 632

🏁 Script executed:

# Final check: let's confirm the current state of these packages in the requirements files echo "=== Checking if these are in pyproject.toml ===" grep -E "circuitbreaker|mcp|oci|oracledb" pyproject.toml || echo "None found in pyproject.toml"

Repository: lightspeed-core/rag-content

Length of output: 147

🏁 Script executed:

# Let's do a final check - search for circuitbreaker in any context to understand where it comes from find . -type f $ -name "*.txt" -o -name "*.toml" -o -name "*.md" $ | xargs grep -i "circuitbreaker" 2>/dev/null

Repository: lightspeed-core/rag-content

Length of output: 189

🏁 Script executed:

# Check if there's any documentation about how these source requirements are generated head -30 requirements.hashes.source.txt

Repository: lightspeed-core/rag-content

Length of output: 2129

🏁 Script executed:

# Check if requirements.source.txt.hashcompile exists and what it contains if [ -f requirements.source.txt.hashcompile ]; then echo "=== requirements.source.txt.hashcompile ===" cat requirements.source.txt.hashcompile else find . -name "*.hashcompile" -o -name "*source*.txt*" | grep -v hashes | grep -v cuda | head -10 fi

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Let's verify what the actual source file for hash compilation is ls -la requirements.source* 2>/dev/null || find . -name "requirements.source*" 2>/dev/null | head -10

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Let's check the git diff or status to understand what files changed git diff --name-only HEAD~1 HEAD | grep -i requirements

Repository: lightspeed-core/rag-content

Length of output: 257

🏁 Script executed:

# Check the git log for the specific commit that modified requirements.hashes.source.txt git log -1 --stat | head -40

Repository: lightspeed-core/rag-content

Length of output: 2344

🏁 Script executed:

# Let's see what the current PR/commit message says git log -1 --format="%B"

Repository: lightspeed-core/rag-content

Length of output: 98

🏁 Script executed:

# Final verification: check if these packages appear anywhere else in the source tree # to understand their origin find . -type f $ -name "*.py" -o -name "*.txt" -o -name "*.toml" $ ! -path "./embeddings_model/*" | xargs grep -l "circuitbreaker\|from mcp\|import mcp" 2>/dev/null | head -10

Repository: lightspeed-core/rag-content

Length of output: 141

🏁 Script executed:

# Let's see if there's any configuration that explains what packages should be included find . -name "*.source.txt*" -o -name "*compile*" | xargs cat 2>/dev/null | head -50

Repository: lightspeed-core/rag-content

Length of output: 3770

🏁 Script executed:

# Check if there are any recent comments or documentation about circuitbreaker git log --all --grep="circuitbreaker" --oneline 2>/dev/null || echo "No circuitbreaker in git history"

Repository: lightspeed-core/rag-content

Length of output: 53

🏁 Script executed:

# Final check: look at the actual requirements.hashes.source.txt structure to see if there's a source file reference head -2 requirements.hashes.source.txt tail -5 requirements.hashes.source.txt

Repository: lightspeed-core/rag-content

Length of output: 703

Remove the orphaned circuitbreaker==2.1.3 pin.

circuitbreaker appears as a top-level pin in requirements.hashes.source.txt but does not appear in requirements-build.txt, is not declared in pyproject.toml, and is not imported anywhere in the codebase. This is a stale pin.

Note: mcp, oci, and oracledb are correctly included as transitive build dependencies (via hatchling, setuptools, wheel, and cython) and should remain.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@requirements.hashes.source.txt` around lines 177 - 179, The requirements.hashes.source.txt contains an orphaned top-level pin "circuitbreaker==2.1.3" that is not referenced in requirements-build.txt, pyproject.toml, or the codebase; remove the "circuitbreaker==2.1.3" entry (both hashes) from requirements.hashes.source.txt, ensure you do not touch the transitive build dependency pins like "mcp", "oci", or "oracledb", and then regenerate or validate your requirements file (and run the build/tests) to confirm nothing else requires circuitbreaker.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

requirements-build.txt (2)
158-199: ⚠️ Potential issue | 🔴 Critical

Critical: Duplicate setuptools entries with conflicting versions.

The lock file contains two different versions of setuptools:

setuptools==82.0.0 (line 158)

setuptools==82.0.1 (line 161)

This is the same critical issue as with hatchling above. Both duplicates must be resolved before this file can be used for installation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements-build.txt` around lines 158 - 199, The file contains duplicate
conflicting pins for setuptools (setuptools==82.0.0 and setuptools==82.0.1);
remove one and ensure a single consistent pinned version (choose 82.0.1 or the
project-preferred version) so the lock has only one setuptools entry, update any
direct dependency that required the removed version if necessary, and regenerate
the lock/requirements file so the dependency graph and comments reflect the
chosen setuptools version (look for the lines containing "setuptools==82.0.0"
and "setuptools==82.0.1" to locate and fix the entries).
45-92: ⚠️ Potential issue | 🔴 Critical

Critical: Duplicate package entries with conflicting versions.

The lock file contains duplicate entries for two packages with different versions:

hatchling: 1.26.3 (line 45) and 1.29.0 (line 51)

setuptools: 82.0.0 and 82.0.1

This is invalid and will cause installation failures, as only one version of each package can be installed at a time. This indicates either a bug in pybuild-deps or unresolved dependency conflicts in the upstream requirements. Regenerate the file or manually resolve the version conflicts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements-build.txt` around lines 45 - 92, The requirements lock contains
duplicate conflicting entries for packages (notably "hatchling" and
"setuptools"); resolve by choosing a single version for each package and
regenerating the lock so only one entry remains per package: inspect dependency
trees that reference hatchling and setuptools, update the top-level
constraint(s) or pin the desired versions, then re-run the lock generation (or
pybuild-deps) to produce a consistent requirements-build.txt without duplicate
lines; ensure the final file contains only the chosen version strings for
"hatchling" and "setuptools" and remove any stale duplicate entries.

♻️ Duplicate comments (4)

.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml (2)

11-12: ⚠️ Potential issue | 🟠 Major

Retarget the PR trigger to main.

The CEL expression targets syedriko-lcore-1440-cuda, but this PR's base branch is main. Once merged, the pipeline won't trigger for PRs to main.

     pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
-      == "syedriko-lcore-1440-cuda"
+      == "main"

,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml around lines 11
- 12, Update the pipelinesascode trigger expression so it targets the
repository's main branch instead of the feature branch; specifically edit the
pipelinesascode.tekton.dev/on-cel-expression value (currently 'event ==
"pull_request" && target_branch == "syedriko-lcore-1440-cuda"') to use 'main' as
the target_branch so the pipeline will run for PRs against main.

234-238: ⚠️ Potential issue | 🟠 Major

Use Tekton's whole-array syntax for matrix params.

The matrix block wraps $(params.build-platforms) in a single-element list instead of using whole-array expansion. This affects all four matrix definitions (lines 234-238, 361-365, 388-392, 439-443).

     matrix:
       params:
       - name: PLATFORM
-        value:
-        - $(params.build-platforms)
+        value: $(params.build-platforms[*])

Apply the same fix to image-platform, platform, and image-arch matrix definitions.

,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml around lines 234
- 238, The matrix params currently wrap the variable in a single-element list
(e.g. params: - name: PLATFORM value: - $(params.build-platforms)); change each
to Tekton's whole-array expansion by removing the list and setting the param
value as a plain scalar: use value: $(params.build-platforms) for the PLATFORM
matrix and apply the same change to the other matrix definitions
(image-platform, platform, image-arch), replacing their single-element lists
with the corresponding whole-array param expansions.

.tekton/lightspeed-core-rag-content-c0ec3-push.yaml (2)

10-11: ⚠️ Potential issue | 🟠 Major

Retarget the push trigger to main.

The CEL expression targets syedriko-lcore-1440-cuda for push events. Once merged to main, pushes to main won't trigger this pipeline.

     pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch
-      == "syedriko-lcore-1440-cuda"
+      == "main"

,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml around lines 10 - 11,
The CEL trigger expression under pipelinesascode.tekton.dev/on-cel-expression
currently matches event == "push" && target_branch ==
"syedriko-lcore-1440-cuda", so pushes to main won't trigger the pipeline; update
the expression to use target_branch == "main" (i.e., replace the branch literal
"syedriko-lcore-1440-cuda" with "main") so push events to main activate the
pipeline.

231-235: ⚠️ Potential issue | 🟠 Major

Use Tekton's whole-array syntax for matrix params.

Same issue as the PR pipeline—all four matrix blocks need the [*] expansion syntax.

     matrix:
       params:
       - name: PLATFORM
-        value:
-        - $(params.build-platforms)
+        value: $(params.build-platforms[*])

Apply to all matrix definitions at lines 231-235, 358-362, 385-389, and 436-440.

,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml around lines 231 - 235,
The matrix param blocks use single-item expansion (- $(params.build-platforms))
instead of Tekton's whole-array syntax; update each matrix params block where
name: PLATFORM currently has value: - $(params.build-platforms) to use value:
[$(params.build-platforms)] (i.e., wrap the substitution in [ ] with the [*]
expansion pattern as required for whole-array expansion) for all four matrix
definitions that declare the PLATFORM param so Tekton receives the full
platforms array.

🧹 Nitpick comments (2)

.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml (1)

54-60: Minor: inconsistent indentation in prefetch-input JSON.

The requirements_files array has inconsistent indentation (line 60 has extra leading space). This won't break parsing but reduces readability.

            "requirements.hashes.wheel.pypi.cuda.aarch64.txt",
            "requirements.hashes.source.cuda.txt",
-             "requirements.hermetic.txt"
+            "requirements.hermetic.txt"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml around lines 54
- 60, The JSON array "requirements_files" has inconsistent indentation for the
final entry "requirements.hermetic.txt"; fix by aligning its leading whitespace
with the other array items so all entries under the requirements_files array use
the same indentation level (match indentation of
"requirements.hashes.source.cuda.txt") to improve readability.

.tekton/lightspeed-core-rag-content-c0ec3-push.yaml (1)

51-57: Minor: inconsistent indentation in prefetch-input JSON.

Same indentation issue as the PR pipeline on line 57.

            "requirements.hashes.wheel.pypi.cuda.aarch64.txt",
            "requirements.hashes.source.cuda.txt",
-             "requirements.hermetic.txt"
+            "requirements.hermetic.txt"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml around lines 51 - 57,
The JSON/YAML array under the "requirements_files" key has inconsistent
indentation for the last entry; align the "requirements.hermetic.txt" line with
the other entries in the "requirements_files" list so all items share the same
indentation and formatting (same quoting/comma style) as the earlier lines to
match the prefetch-input formatting used elsewhere.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml:
- Line 5: The repo annotation build.appstudio.openshift.io/repo currently points
to the personal fork URL
"https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}";
update that value to the upstream repository URL
"https://github.com/lightspeed-core/rag-content?rev={{revision}}" so the
annotation targets the correct upstream repo (edit the string value in the
.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml file where
build.appstudio.openshift.io/repo is defined).

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml:
- Line 5: The repo annotation value for build.appstudio.openshift.io/repo
currently points to the personal fork
"https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}";
update that annotation to the upstream repository URL (replace the syedriko
owner/path with the upstream org/user and repo name, preserving the
?rev={{revision}} query) so build.appstudio.openshift.io/repo references the
correct upstream repo.

---

Outside diff comments:
In `@requirements-build.txt`:
- Around line 158-199: The file contains duplicate conflicting pins for
setuptools (setuptools==82.0.0 and setuptools==82.0.1); remove one and ensure a
single consistent pinned version (choose 82.0.1 or the project-preferred
version) so the lock has only one setuptools entry, update any direct dependency
that required the removed version if necessary, and regenerate the
lock/requirements file so the dependency graph and comments reflect the chosen
setuptools version (look for the lines containing "setuptools==82.0.0" and
"setuptools==82.0.1" to locate and fix the entries).
- Around line 45-92: The requirements lock contains duplicate conflicting
entries for packages (notably "hatchling" and "setuptools"); resolve by choosing
a single version for each package and regenerating the lock so only one entry
remains per package: inspect dependency trees that reference hatchling and
setuptools, update the top-level constraint(s) or pin the desired versions, then
re-run the lock generation (or pybuild-deps) to produce a consistent
requirements-build.txt without duplicate lines; ensure the final file contains
only the chosen version strings for "hatchling" and "setuptools" and remove any
stale duplicate entries.

---

Duplicate comments:
In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml:
- Around line 11-12: Update the pipelinesascode trigger expression so it targets
the repository's main branch instead of the feature branch; specifically edit
the pipelinesascode.tekton.dev/on-cel-expression value (currently 'event ==
"pull_request" && target_branch == "syedriko-lcore-1440-cuda"') to use 'main' as
the target_branch so the pipeline will run for PRs against main.
- Around line 234-238: The matrix params currently wrap the variable in a
single-element list (e.g. params: - name: PLATFORM value: -
$(params.build-platforms)); change each to Tekton's whole-array expansion by
removing the list and setting the param value as a plain scalar: use value:
$(params.build-platforms) for the PLATFORM matrix and apply the same change to
the other matrix definitions (image-platform, platform, image-arch), replacing
their single-element lists with the corresponding whole-array param expansions.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml:
- Around line 10-11: The CEL trigger expression under
pipelinesascode.tekton.dev/on-cel-expression currently matches event == "push"
&& target_branch == "syedriko-lcore-1440-cuda", so pushes to main won't trigger
the pipeline; update the expression to use target_branch == "main" (i.e.,
replace the branch literal "syedriko-lcore-1440-cuda" with "main") so push
events to main activate the pipeline.
- Around line 231-235: The matrix param blocks use single-item expansion (-
$(params.build-platforms)) instead of Tekton's whole-array syntax; update each
matrix params block where name: PLATFORM currently has value: -
$(params.build-platforms) to use value: [$(params.build-platforms)] (i.e., wrap
the substitution in [ ] with the [*] expansion pattern as required for
whole-array expansion) for all four matrix definitions that declare the PLATFORM
param so Tekton receives the full platforms array.

---

Nitpick comments:
In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml:
- Around line 54-60: The JSON array "requirements_files" has inconsistent
indentation for the final entry "requirements.hermetic.txt"; fix by aligning its
leading whitespace with the other array items so all entries under the
requirements_files array use the same indentation level (match indentation of
"requirements.hashes.source.cuda.txt") to improve readability.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml:
- Around line 51-57: The JSON/YAML array under the "requirements_files" key has
inconsistent indentation for the last entry; align the
"requirements.hermetic.txt" line with the other entries in the
"requirements_files" list so all items share the same indentation and formatting
(same quoting/comma style) as the earlier lines to match the prefetch-input
formatting used elsewhere.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a0206944-4ac5-4f2a-860e-8c6662233dfb

📥 Commits

Reviewing files that changed from the base of the PR and between 3e020e4 and fc45118.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml
.tekton/lightspeed-core-rag-content-c0ec3-push.yaml
requirements-build.cuda.txt
requirements-build.txt
requirements.hashes.source.cuda.txt
requirements.hashes.source.txt
requirements.hashes.wheel.cuda.txt
requirements.hashes.wheel.pypi.cuda.base.txt
requirements.hashes.wheel.pypi.txt
scripts/download_embeddings_model.py
scripts/generate_embeddings.py
tests/test_generate_embeddings.py
tests/test_metadata_processor.py

✅ Files skipped from review due to trivial changes (7)

scripts/download_embeddings_model.py
tests/test_generate_embeddings.py
scripts/generate_embeddings.py
requirements-build.cuda.txt
requirements.hashes.source.cuda.txt
requirements.hashes.wheel.cuda.txt
tests/test_metadata_processor.py

🚧 Files skipped from review as they are similar to previous changes (1)

requirements.hashes.source.txt

coderabbitai · 2026-03-24T03:56:04Z

.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml

+kind: PipelineRun
+metadata:
+  annotations:
+    build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}


⚠️ Potential issue | 🟠 Major

Update repo annotation to target the correct repository.

The annotation references syedriko/lightspeed-core-rag-content (a personal fork) instead of the upstream lightspeed-core/rag-content. This should be updated before merging.

- build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}} + build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}

build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml at line 5, The repo annotation build.appstudio.openshift.io/repo currently points to the personal fork URL "https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}"; update that value to the upstream repository URL "https://github.com/lightspeed-core/rag-content?rev={{revision}}" so the annotation targets the correct upstream repo (edit the string value in the .tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml file where build.appstudio.openshift.io/repo is defined).

coderabbitai · 2026-03-24T03:56:04Z

.tekton/lightspeed-core-rag-content-c0ec3-push.yaml

+kind: PipelineRun
+metadata:
+  annotations:
+    build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}


⚠️ Potential issue | 🟠 Major

Update repo annotation to target the correct repository.

Same issue as the PR pipeline—references the personal fork instead of the upstream repository.

- build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}} + build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}

build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml at line 5, The repo annotation value for build.appstudio.openshift.io/repo currently points to the personal fork "https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}"; update that annotation to the upstream repository URL (replace the syedriko owner/path with the upstream org/user and repo name, preserving the ?rev={{revision}} query) so build.appstudio.openshift.io/repo references the correct upstream repo.

raptorsun · 2026-03-24T08:07:40Z

/retest

raptorsun · 2026-03-24T08:10:45Z

.github/workflows/build_and_push_dev.yaml

 on:
  push:
-    branches: [ main ]
+    branches: [ syedriko-lcore-1440-cuda ]


is syedriko-lcore-1440-cuda the new main branch?

No, it's there to test image builds triggered by GH actions. I'll remove them before merging.

coderabbitai

♻️ Duplicate comments (6)

.tekton/lightspeed-core-rag-content-c0ec3-push.yaml (3)
5-5: ⚠️ Potential issue | 🟠 Major

Update repo annotation to target the correct repository.

The annotation references syedriko/lightspeed-core-rag-content (a personal fork) instead of the upstream lightspeed-core/rag-content. This should be corrected before merging.
Suggested fix
-    build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}
+    build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml at line 5, The repo
annotation value in the Tekton pipeline manifest is pointing to a personal fork;
update the annotation key build.appstudio.openshift.io/repo in the YAML (the
line containing "build.appstudio.openshift.io/repo:
https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}") to
reference the upstream repository URL
https://github.com/lightspeed-core/rag-content?rev={{revision}} so the pipeline
uses the correct upstream repo.
231-235: ⚠️ Potential issue | 🟠 Major

Use Tekton's whole-array substitution syntax for matrix params.

The current syntax wraps the array parameter in an additional list level. Per Tekton documentation, use value: $(params.build-platforms[*]) for proper array expansion in matrix params. This applies to all four matrix blocks: build-images (here), clair-scan (lines 358-362), ecosystem-cert-preflight-checks (lines 385-389), and clamav-scan (lines 436-440).
Suggested fix (apply to all four matrix blocks)
     - matrix:
         params:
         - name: PLATFORM
-          value:
-          - $(params.build-platforms)
+          value: $(params.build-platforms[*])
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml around lines 231 - 235,
The matrix param currently nests the array incorrectly; update each matrix block
(build-images, clair-scan, ecosystem-cert-preflight-checks, clamav-scan) to use
Tekton's whole-array substitution syntax by replacing the current wrapped list
form of PLATFORM's value with the whole-array expansion syntax
$(params.build-platforms[*]) so the PLATFORM matrix receives the expanded array
properly; locate the matrix blocks named build-images, clair-scan,
ecosystem-cert-preflight-checks, and clamav-scan and change the PLATFORM param
value accordingly.
10-11: ⚠️ Potential issue | 🟠 Major

Retarget the push trigger to main.

This CEL filter only matches pushes to syedriko-lcore-1440-cuda. Once this file lands on main, pushes to main will not trigger this pipeline, rendering the CUDA push pipeline inert.
Suggested fix
     pipelinesascode.tekton.dev/on-cel-expression: event == "push" && target_branch
-      == "syedriko-lcore-1440-cuda"
+      == "main"
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml around lines 10 - 11,
The CEL filter under the annotation pipelinesascode.tekton.dev/on-cel-expression
currently matches only pushes where target_branch == "syedriko-lcore-1440-cuda";
update that expression (the value of
pipelinesascode.tekton.dev/on-cel-expression) so the target_branch is "main"
(e.g., event == "push" && target_branch == "main") to ensure pushes to main
trigger the pipeline.
.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml (3)
234-238: ⚠️ Potential issue | 🟠 Major

Use Tekton's whole-array substitution syntax for matrix params.

The current syntax wraps the array parameter in an additional list level. Per Tekton documentation, use value: $(params.build-platforms[*]) for proper array expansion. This applies to all four matrix blocks in this file: build-images (here), clair-scan (lines 361-365), ecosystem-cert-preflight-checks (lines 388-392), and clamav-scan (lines 439-443).
Suggested fix (apply to all four matrix blocks)
     - matrix:
         params:
         - name: PLATFORM
-          value:
-          - $(params.build-platforms)
+          value: $(params.build-platforms[*])
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml around lines 234
- 238, The matrix param for PLATFORM is incorrectly wrapped as a nested list;
update each matrix block (build-images, clair-scan,
ecosystem-cert-preflight-checks, clamav-scan) to use Tekton's whole-array
substitution by replacing the current value expression with the array expansion
syntax: set the PLATFORM param value to $(params.build-platforms[*]) in each
matrix block so the array expands properly at runtime.
5-5: ⚠️ Potential issue | 🟠 Major

Update repo annotation to target the correct repository.

The annotation references syedriko/lightspeed-core-rag-content (a personal fork) instead of the upstream lightspeed-core/rag-content. This should be corrected before merging.
Suggested fix
-    build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}
+    build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml at line 5, The
repo annotation build.appstudio.openshift.io/repo is pointing to the personal
fork URL
"https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}";
update that value to the upstream repository URL
"https://github.com/lightspeed-core/rag-content?rev={{revision}}" so the
pipeline targets the correct upstream repo (modify the annotation value in
.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml).
11-12: ⚠️ Potential issue | 🟠 Major

Retarget the PR trigger to main.

This PR's base branch is main, so target_branch == "syedriko-lcore-1440-cuda" prevents the validation pipeline from running on PRs targeting main. This will cause the pipeline to be inert after merge.
Suggested fix
     pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
-      == "syedriko-lcore-1440-cuda"
+      == "main"
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml around lines 11
- 12, The pipeline trigger expression under
pipelinesascode.tekton.dev/on-cel-expression currently matches target_branch ==
"syedriko-lcore-1440-cuda"; update that expression to match the repository's
base branch (main) so the validation pipeline runs for PRs targeting main —
specifically edit the on-cel-expression value to use target_branch == "main"
(adjust the expression string where it appears) to retarget the PR trigger to
main.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml:
- Around line 234-238: The matrix param for PLATFORM is incorrectly wrapped as a
nested list; update each matrix block (build-images, clair-scan,
ecosystem-cert-preflight-checks, clamav-scan) to use Tekton's whole-array
substitution by replacing the current value expression with the array expansion
syntax: set the PLATFORM param value to $(params.build-platforms[*]) in each
matrix block so the array expands properly at runtime.
- Line 5: The repo annotation build.appstudio.openshift.io/repo is pointing to
the personal fork URL
"https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}";
update that value to the upstream repository URL
"https://github.com/lightspeed-core/rag-content?rev={{revision}}" so the
pipeline targets the correct upstream repo (modify the annotation value in
.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml).
- Around line 11-12: The pipeline trigger expression under
pipelinesascode.tekton.dev/on-cel-expression currently matches target_branch ==
"syedriko-lcore-1440-cuda"; update that expression to match the repository's
base branch (main) so the validation pipeline runs for PRs targeting main —
specifically edit the on-cel-expression value to use target_branch == "main"
(adjust the expression string where it appears) to retarget the PR trigger to
main.

In @.tekton/lightspeed-core-rag-content-c0ec3-push.yaml:
- Line 5: The repo annotation value in the Tekton pipeline manifest is pointing
to a personal fork; update the annotation key build.appstudio.openshift.io/repo
in the YAML (the line containing "build.appstudio.openshift.io/repo:
https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}") to
reference the upstream repository URL
https://github.com/lightspeed-core/rag-content?rev={{revision}} so the pipeline
uses the correct upstream repo.
- Around line 231-235: The matrix param currently nests the array incorrectly;
update each matrix block (build-images, clair-scan,
ecosystem-cert-preflight-checks, clamav-scan) to use Tekton's whole-array
substitution syntax by replacing the current wrapped list form of PLATFORM's
value with the whole-array expansion syntax $(params.build-platforms[*]) so the
PLATFORM matrix receives the expanded array properly; locate the matrix blocks
named build-images, clair-scan, ecosystem-cert-preflight-checks, and clamav-scan
and change the PLATFORM param value accordingly.
- Around line 10-11: The CEL filter under the annotation
pipelinesascode.tekton.dev/on-cel-expression currently matches only pushes where
target_branch == "syedriko-lcore-1440-cuda"; update that expression (the value
of pipelinesascode.tekton.dev/on-cel-expression) so the target_branch is "main"
(e.g., event == "push" && target_branch == "main") to ensure pushes to main
trigger the pipeline.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e68b3fde-a194-4ef7-ba9c-8da6cc0248a0

📥 Commits

Reviewing files that changed from the base of the PR and between fc45118 and fc0a028.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (16)

.tekton/lightspeed-core-rag-content-c0ec3-pull-request.yaml
.tekton/lightspeed-core-rag-content-c0ec3-push.yaml
Containerfile
Containerfile-cuda
Gemfile
requirements-build.cuda.txt
requirements-build.txt
requirements.hashes.source.cuda.txt
requirements.hashes.source.txt
requirements.hashes.wheel.cuda.txt
requirements.hashes.wheel.pypi.cuda.base.txt
requirements.hashes.wheel.pypi.txt
scripts/download_embeddings_model.py
scripts/generate_embeddings.py
tests/test_generate_embeddings.py
tests/test_metadata_processor.py

✅ Files skipped from review due to trivial changes (11)

scripts/download_embeddings_model.py
tests/test_generate_embeddings.py
requirements.hashes.wheel.cuda.txt
scripts/generate_embeddings.py
Gemfile
requirements-build.cuda.txt
tests/test_metadata_processor.py
requirements.hashes.source.cuda.txt
Containerfile-cuda
requirements.hashes.source.txt
requirements-build.txt

🚧 Files skipped from review as they are similar to previous changes (1)

Containerfile

TamiTakamiya reviewed Mar 23, 2026

View reviewed changes

syedriko force-pushed the syedriko-lcore-1440-cuda branch 2 times, most recently from 7694a7c to 3e020e4 Compare March 24, 2026 02:11

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

syedriko force-pushed the syedriko-lcore-1440-cuda branch from 3e020e4 to fc45118 Compare March 24, 2026 03:51

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

raptorsun reviewed Mar 24, 2026

View reviewed changes

syedriko force-pushed the syedriko-lcore-1440-cuda branch from fc45118 to fc0a028 Compare March 24, 2026 15:47

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

syedriko force-pushed the syedriko-lcore-1440-cuda branch 5 times, most recently from e4b0f2f to 7047b93 Compare March 29, 2026 18:47

LCORE-1440: Build CUDA GPU rag-content in Konflux

80cd58c

syedriko force-pushed the syedriko-lcore-1440-cuda branch 8 times, most recently from e5f46fd to d76b6ab Compare March 29, 2026 21:45

latest python deps updates

0b6b85a

syedriko force-pushed the syedriko-lcore-1440-cuda branch from d76b6ab to 0b6b85a Compare March 29, 2026 23:12

		pipelinesascode.tekton.dev/on-cel-expression: event == "pull_request" && target_branch
		== "syedriko-lcore-1440-cuda"

	build.appstudio.openshift.io/repo: https://github.com/syedriko/lightspeed-core-rag-content?rev={{revision}}
	build.appstudio.openshift.io/repo: https://github.com/lightspeed-core/rag-content?rev={{revision}}

Conversation

syedriko commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

syedriko commented Mar 23, 2026

Uh oh!

TamiTakamiya Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

syedriko Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

raptorsun commented Mar 24, 2026

syedriko commented Mar 22, 2026 •

edited

Loading

coderabbitai bot commented Mar 22, 2026 •

edited

Loading