LCORE-1424: sqlite-vec deprecation#102
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (14)
💤 Files with no reviewable changes (5)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughThis PR removes SQLite-vec as a supported vector store option across the project, eliminating the dependency, updating CLI argument choices, removing related documentation, cleaning up test cases, and updating CI/CD configuration and build requirements accordingly. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
229-246:⚠️ Potential issue | 🟠 MajorInconsistency: Documentation removes SQLite-vec but source code still supports it.
The README now states that Llama-Stack vector stores support "Faiss or Pgvector", but the relevant code snippets show:
utils.pystill lists"llamastack-sqlite-vec"as a valid--vector-store-typechoicedocument_processor.pydocstring states: "supports 2 llama-stack Vector IO providers: faiss and sqlite-vec"document_processor.pyhasVECTOR_IO_CONFIG_TEMPLATE_FOR_SQLITEconfigurationThis creates a situation where users may still use
--vector-store-type llamastack-sqlite-vecsuccessfully at runtime, but find no documentation for it. Either:
- Update source code to remove sqlite-vec support entirely, or
- Add a deprecation notice in the CLI help text if sqlite-vec should remain functional but discouraged
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 229 - 246, The code/documentation mismatch around "llamastack-sqlite-vec" should be resolved by marking sqlite-vec deprecated rather than removing it: update the CLI help for the --vector-store-type choice in utils.py to include a clear deprecation note for "llamastack-sqlite-vec" (e.g., "(deprecated)"), update the document_processor.py module docstring to state that sqlite-vec is deprecated and will be removed in a future release, and add a short runtime warning when VECTOR_IO_CONFIG_TEMPLATE_FOR_SQLITE is used (or when vector-store-type == "llamastack-sqlite-vec") to inform users and point them to Faiss/Pgvector alternatives; ensure the strings and warning are consistent across utils.py, document_processor.py, and wherever VECTOR_IO_CONFIG_TEMPLATE_FOR_SQLITE is referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.tekton/rag-tool-pull-request.yaml:
- Line 66: The packages list in the Tekton prefetch config (the "packages" entry
in .tekton/rag-tool-pull-request.yaml) is missing latex2mathml while
requirements.hashes.source.txt and uv.lock still reference it; update that
"packages" string to include latex2mathml (add it among the comma-separated
values) so the binary.packages prefetch matches the lock/requirements files and
prevents hermetic build failures.
In `@requirements-build.txt`:
- Line 54: The comment warns that the package latex2mathml is still present in
the resolved dependency graph; fix by making the dependency state consistent:
either fully remove latex2mathml from the build by deleting it from the lock and
hash manifests (uv.lock and requirements.hashes.source.txt) and verify there are
no upstream references left, or keep it and add it back into the hermetic
prefetch configuration so all three places match; update any reference-checking
scripts or prefetch list that mention hermetic prefetch so they reference the
same package list and re-run dependency resolution to ensure consistency.
In `@scripts/konflux_requirements.sh`:
- Line 22: You removed the package from PYPI_WHEELS but deprecation is
incomplete: search for the string literals "llamastack-sqlite-vec" and
"sqlite-vec" and remove or replace all imports, conditional branches,
installation/requirement references, and test assertions that reference them;
specifically update the PYPI_WHEELS variable usage, the helper code in the utils
module (functions in lightspeed_rag_content.utils), the DocumentProcessor
class/logic in lightspeed_rag_content.document_processor, the query helper code
used by query_rag, and the failing tests (test_utils, test_document_processor,
test_document_processor_llama_stack) to either remove dependency on sqlite-vec
or mock/replace with the supported alternative so tests and scripts no longer
reference those package names. Ensure no leftover literal references remain in
source, tests, or scripts and adjust any documentation or comments accordingly.
---
Outside diff comments:
In `@README.md`:
- Around line 229-246: The code/documentation mismatch around
"llamastack-sqlite-vec" should be resolved by marking sqlite-vec deprecated
rather than removing it: update the CLI help for the --vector-store-type choice
in utils.py to include a clear deprecation note for "llamastack-sqlite-vec"
(e.g., "(deprecated)"), update the document_processor.py module docstring to
state that sqlite-vec is deprecated and will be removed in a future release, and
add a short runtime warning when VECTOR_IO_CONFIG_TEMPLATE_FOR_SQLITE is used
(or when vector-store-type == "llamastack-sqlite-vec") to inform users and point
them to Faiss/Pgvector alternatives; ensure the strings and warning are
consistent across utils.py, document_processor.py, and wherever
VECTOR_IO_CONFIG_TEMPLATE_FOR_SQLITE is referenced.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 876ac79c-c224-4e66-b65a-091d4e936e4e
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
.tekton/rag-tool-pull-request.yaml.tekton/rag-tool-push.yamlREADME.mdpyproject.tomlrequirements-build.txtrequirements.hashes.source.txtrequirements.hashes.wheel.pypi.txtrequirements.hashes.wheel.txtscripts/konflux_requirements.sh
💤 Files with no reviewable changes (3)
- pyproject.toml
- requirements.hashes.wheel.txt
- requirements.hashes.wheel.pypi.txt
| # extra wheels to be included in the wheel list, often come from build-time dependencies | ||
| EXTRA_WHEELS="uv-build,uv,pip,maturin" | ||
| PYPI_WHEELS="opencv-python,omegaconf,rapidocr,sqlite-vec,griffe,griffecli,griffelib,pyclipper,tree-sitter-typescript" | ||
| PYPI_WHEELS="opencv-python,omegaconf,rapidocr,griffe,griffecli,griffelib,pyclipper,tree-sitter-typescript" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify sqlite-vec references remaining in source code
rg -n "sqlite-vec|sqlite_vec" --type pyRepository: lightspeed-core/rag-content
Length of output: 1094
Approve removal from PYPI_WHEELS, but flag incomplete sqlite-vec deprecation.
The removal from scripts/konflux_requirements.sh is correct and aligns with deprecation goals. However, multiple remaining references to llamastack-sqlite-vec and sqlite-vec throughout the codebase indicate the deprecation is incomplete:
- Source code:
src/lightspeed_rag_content/utils.py:128,src/lightspeed_rag_content/document_processor.py:306,325 - Tests:
tests/test_utils.py:39,tests/test_document_processor.py:123,tests/test_document_processor_llama_stack.py:214,257,260 - Scripts:
scripts/query_rag.py:291,322,323
For full deprecation, remove all references from source files, tests, and helper scripts to maintain consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/konflux_requirements.sh` at line 22, You removed the package from
PYPI_WHEELS but deprecation is incomplete: search for the string literals
"llamastack-sqlite-vec" and "sqlite-vec" and remove or replace all imports,
conditional branches, installation/requirement references, and test assertions
that reference them; specifically update the PYPI_WHEELS variable usage, the
helper code in the utils module (functions in lightspeed_rag_content.utils), the
DocumentProcessor class/logic in lightspeed_rag_content.document_processor, the
query helper code used by query_rag, and the failing tests (test_utils,
test_document_processor, test_document_processor_llama_stack) to either remove
dependency on sqlite-vec or mock/replace with the supported alternative so tests
and scripts no longer reference those package names. Ensure no leftover literal
references remain in source, tests, or scripts and adjust any documentation or
comments accordingly.
87a6c75 to
3579667
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/konflux_requirements.sh (1)
22-22:⚠️ Potential issue | 🟠 MajorComplete sqlite-vec deprecation in runtime/test paths before removing its packaging path.
Line 22 removes sqlite-vec from wheel handling, but sqlite-vec is still an accepted backend in code/tests (
src/lightspeed_rag_content/document_processor.pyLine 323,tests/test_document_processor.pyLine 122,tests/test_document_processor_llama_stack.pyLine 210). This leaves a declared backend path without aligned dependency support.#!/bin/bash set -euo pipefail echo "Remaining sqlite-vec references (code/tests/docs/scripts):" rg -n -C2 'llamastack-sqlite-vec|sqlite-vec|inline::sqlite-vec' src tests scripts README.md echo echo "Vector store type checks that still include sqlite-vec:" rg -n -C3 'vector_store_type.*sqlite|llamastack-sqlite-vec' src tests🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/konflux_requirements.sh` at line 22, The PYPI_WHEELS list was changed to remove sqlite-vec but code/tests still reference the backend; either restore packaging or fully remove the backend references. Fix by one of two ways: (A) re-add "sqlite-vec" (or the exact wheel name used at runtime) into the PYPI_WHEELS string so packaging provides the dependency, or (B) remove/branch out all runtime and test references to the backend—search for "llamastack-sqlite-vec", "sqlite-vec", and "inline::sqlite-vec" and update logic in src/lightspeed_rag_content/document_processor.py (the backend selection/handling around the sqlite-vec case) and the tests tests/test_document_processor.py and tests/test_document_processor_llama_stack.py to stop expecting that backend. Ensure vector_store_type checks and any imports are aligned with whichever option you choose.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@requirements.hashes.source.txt`:
- Around line 189-191: Run full regression and integration tests for the
dependency bumps to docling-core and llama-index family: execute the test suite
exercising document_processor.py and html_reader.py (including integration tests
that call Settings, MarkdownNodeParser, HuggingFaceEmbedding, FaissVectorStore,
PGVectorStore and the docling.DocumentConverter.export_to_markdown path) rather
than relying only on mocks, validate that ingestion, markdown export, embedding
creation and vector store indexing behave as expected, update or expand
mocks/tests where behavior changed, and if any incompatibilities are found
adjust code or pin versions accordingly.
---
Duplicate comments:
In `@scripts/konflux_requirements.sh`:
- Line 22: The PYPI_WHEELS list was changed to remove sqlite-vec but code/tests
still reference the backend; either restore packaging or fully remove the
backend references. Fix by one of two ways: (A) re-add "sqlite-vec" (or the
exact wheel name used at runtime) into the PYPI_WHEELS string so packaging
provides the dependency, or (B) remove/branch out all runtime and test
references to the backend—search for "llamastack-sqlite-vec", "sqlite-vec", and
"inline::sqlite-vec" and update logic in
src/lightspeed_rag_content/document_processor.py (the backend selection/handling
around the sqlite-vec case) and the tests tests/test_document_processor.py and
tests/test_document_processor_llama_stack.py to stop expecting that backend.
Ensure vector_store_type checks and any imports are aligned with whichever
option you choose.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 888961f2-cdf5-465b-8253-0c23fe9210cd
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
.tekton/rag-tool-pull-request.yaml.tekton/rag-tool-push.yamlREADME.mdpyproject.tomlrequirements-build.txtrequirements.hashes.source.txtrequirements.hashes.wheel.pypi.txtrequirements.hashes.wheel.txtscripts/konflux_requirements.sh
💤 Files with no reviewable changes (3)
- requirements.hashes.wheel.pypi.txt
- requirements.hashes.wheel.txt
- pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- .tekton/rag-tool-pull-request.yaml
| docling-core==2.70.0 \ | ||
| --hash=sha256:1f4d17596ec5cb510c3dd991dd9850dcd4d2a3b858d4534b34833a33838c8f6e \ | ||
| --hash=sha256:becfde023b620643206e9cbc3a9a826a6520acc5f9bb11dccf8f70be631aee54 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Bumped dependency pins in lockfile:"
rg -n '^(docling-core|llama-index(|-core|-cli|-embeddings-huggingface|-embeddings-openai|-indices-managed-llama-cloud|-instrumentation|-llms-openai|-readers-file|-readers-llama-parse|-vector-stores-faiss|-vector-stores-postgres))==' requirements.hashes.source.txt
echo
echo "Likely impacted call sites:"
rg -n -C2 'from llama_index|Settings\.|resolve_llm|MarkdownNodeParser|DocumentConverter|export_to_markdown' src testsRepository: lightspeed-core/rag-content
Length of output: 11871
Run regression testing for docling-core and llama-index family dependency updates before merge.
These version bumps directly affect core document ingestion and vector indexing APIs:
document_processor.pyusesSettingsconfiguration,MarkdownNodeParser,HuggingFaceEmbedding,FaissVectorStore, andPGVectorStorefrom the updated packageshtml_reader.pydepends ondocling.DocumentConverterand itsexport_to_markdown()method- The changes span 13 related packages across multiple minor versions
Existing test coverage includes mocks of these APIs, but actual integration compatibility should be validated to catch any breaking changes in the ingestion pipeline.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@requirements.hashes.source.txt` around lines 189 - 191, Run full regression
and integration tests for the dependency bumps to docling-core and llama-index
family: execute the test suite exercising document_processor.py and
html_reader.py (including integration tests that call Settings,
MarkdownNodeParser, HuggingFaceEmbedding, FaissVectorStore, PGVectorStore and
the docling.DocumentConverter.export_to_markdown path) rather than relying only
on mocks, validate that ingestion, markdown export, embedding creation and
vector store indexing behave as expected, update or expand mocks/tests where
behavior changed, and if any incompatibilities are found adjust code or pin
versions accordingly.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lightspeed_rag_content/document_processor.py (1)
340-341:⚠️ Potential issue | 🟡 MinorStale comment references removed sqlite-vec.
The comment mentions
sqlitevec_store.dbbut sqlite-vec support is being removed. Update to reflect the current valid options.Proposed fix
- # faiss_store.db or sqlitevec_store.db + # faiss_store.db or pgvector_store.db self.db_filename = config.vector_store_type[11:] + "_store.db"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_rag_content/document_processor.py` around lines 340 - 341, The inline comment above the db filename construction is stale—remove or update it to reflect current valid vector store options instead of referencing the removed "sqlite-vec"; update the comment near self.db_filename and the expression that uses config.vector_store_type (which slices from index 11) to describe the actual store filenames produced (e.g., "faiss_store.db" or other supported stores) or remove the comment entirely so it no longer mentions sqlite-vec.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/lightspeed_rag_content/document_processor.py`:
- Around line 340-341: The inline comment above the db filename construction is
stale—remove or update it to reflect current valid vector store options instead
of referencing the removed "sqlite-vec"; update the comment near
self.db_filename and the expression that uses config.vector_store_type (which
slices from index 11) to describe the actual store filenames produced (e.g.,
"faiss_store.db" or other supported stores) or remove the comment entirely so it
no longer mentions sqlite-vec.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 260000d5-e64f-42ac-a09c-1f8ac030ab95
📒 Files selected for processing (6)
scripts/query_rag.pysrc/lightspeed_rag_content/document_processor.pysrc/lightspeed_rag_content/utils.pytests/test_document_processor.pytests/test_document_processor_llama_stack.pytests/test_utils.py
💤 Files with no reviewable changes (3)
- tests/test_utils.py
- tests/test_document_processor_llama_stack.py
- src/lightspeed_rag_content/utils.py
sqlite-vec is deprecated and any reference to it should be removed. The functionality is however kept in the codebase for now. Reason for deprecation: package has no source distribution available for konflux to build.
Remove sqlite-vec from CLI choices, assertions, auto-detection, and tests across the codebase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
71c612f to
002fd1a
Compare
|
PR is ready, waiting for other PR's to be merged in rag-content to re-generate the konflux files |
Description
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
NA
Related Tickets & Documents
Checklist before requesting a review
Testing
NA
Summary by CodeRabbit
Breaking Changes
Documentation
Chores
sqlite-vec, adjusted dependency versions and build configuration.