Adding implementations for vector stores by AmoghTantradi · Pull Request #79 · lotus-data/lotus

AmoghTantradi · 2025-01-14T06:08:43Z

Verified tests pass locally (for those that can be run locally at least)

…tion

…methods

…ma_vs

…c index

…mentation Also note that we need to now include deleting previous collections and restarting new ones / error handling for the dimension-mismatch case

…mentation

liana313 · 2025-02-11T02:39:30Z

lotus/sem_ops/sem_search.py

+            K = min(K, cur_min)

            search_K = K
            while True:


We should no longer be manually post filtering for the non-faiss vector stores -- instead we should perform a filtered vector search using the VS API (most of them let you filter on an exact match predicate)

liana313 · 2025-02-11T02:45:55Z

lotus/sem_ops/sem_index.py

            pd.DataFrame: The DataFrame with the index directory saved.
        """
-        if lotus.settings.rm is None:
+


it may be good to display a warning here, telling the user to not reset the df index, getting embedding ids requires the df index is invariant so that get_vectors_from_index returns correctly

liana313 · 2025-02-11T02:51:00Z

lotus/vector_store/chroma_vs.py

+        metadatas: list[Mapping[str, Union[str, int, float, bool]]] = [{"doc_id": int(i)} for i in range(len(docs_list))]
+
+        # Add documents in batches
+        batch_size = 100


better to add a vs_upsert_batch_size setting that can be configured in our settings module, rather than hard-coding it

liana313 · 2025-02-11T02:55:10Z

lotus/vector_store/vs.py

    @abstractmethod 
-    def search(self,
-    queries: pd.Series | str | Image.Image | list | NDArray[np.float64],
+    def __call__(self,


it might make sense to add an ids parameter here, so that sem ops can make filtered search calls easily rather than re-implementing vs-specific logic each time

liana313 · 2025-02-11T02:57:03Z

lotus/vector_store/pinecone_vs.py

+            })
+
+        # Upsert in batches of 100
+        batch_size = 100


move batch-size to a settings configuration (same comment for all vs implementations)

liana313 · 2025-02-11T03:02:51Z

lotus/sem_ops/sem_sim_join.py

-        rm_output: RMOutput = rm(queries, K)
-        distances = rm_output.distances
-        indices = rm_output.indices
+        vs_output: RMOutput = vs(query_vectors, K)


we need to add a parameter for ids in vs() and set it to the index of the df we're searching over -- otherwise we won't catch the case when the df being searched over has been filtered by the user

liana313 · 2025-02-11T03:06:17Z

.github/tests/rm_tests.py

+# VS Only Tests
+################################################################################
+
+


an important test to add is doing filtered vector search -- ie the program starts with some df, embeds/indexes the column, does any filter op (can be a structured filter), then calls a sem op that uses search over the indexed column

AmoghTantradi added 16 commits January 12, 2025 11:12

initial scaffolding for adding vector store / vector database integra…

e3abd90

…tion

fixed linting, ruff checks pass

bd1e8fd

added changes to requirements.txt file and added additional abstract …

880c31f

…methods

refactored

7b5dfd3

added tests for clustering and filtering

08dfaba

made edits to test_filter

f3a82c1

added implementations for weaviate and pinecone vs

fc62846

fixed merge conflicts

3e89b5f

added extra refactoring and added implementations for qdrant and chro…

f2937ad

…ma_vs

fixed some type errors

a4c7418

made further corrections

1357fb3

edit uuid type

c76b658

changed uuid type

9f257f7

made type changes to weaviate file

99cb535

made another change

3c8a742

typecheck passes for weaviate?

ccd9e48

AmoghTantradi force-pushed the at/vs_implementation branch from b53d8a2 to ccd9e48 Compare January 15, 2025 06:40

AmoghTantradi and others added 13 commits January 15, 2025 17:09

type changes for weaviate and qdrant files

89bf974

made changes to weaviate file

a76adb7

made changes to weaviate file

c3e0f0c

fixed pinecone type errors

1782281

fixed pinecone type errors

0621b9b

type checks all pass locally

b568d1e

fixed linting errors

9b33a1f

made refactors to allow for testing

820f3be

made changes to tests

a0a70d2

fixed

6dbd1db

changed setattr to getattr

bea1d19

fixed a test

f93f7ed

over

38ff87d

second refactor (removed index_dir)

23bafa5

AmoghTantradi force-pushed the at/vs_implementation branch from d125eda to 23bafa5 Compare January 27, 2025 16:39

AmoghTantradi added 20 commits January 27, 2025 09:25

fixed type checks

75d11ea

fixed retriever module errors

0b0bf38

fixed key error

6bf7926

added fixes to failing rm tests

f7071a2

fixed chroma

6ebe407

removed dynamic indexing for weaviatevs

e588bee

fixed type errors

d6a86e1

changed weaviate index config

ddfd549

changed rm tests index name to avoid pinecone failures

20206e1

fixed naming convention for index_dir and fixed serverless spec for p…

e7ea24f

…c index

changed serverless spec for pc index due to free plan

f152b54

added debug statement

2e21a97

made changes to errors

524b501

Merge branch 'main' of github.com:guestrin-lab/lotus into at/vs_imple…

87f57e1

…mentation Also note that we need to now include deleting previous collections and restarting new ones / error handling for the dimension-mismatch case

added some fixes to collection upload error handling

e995996

made some other change

1a75486

fixed type errors for qdrant vs

c5f50f6

changed endpoint

85daf51

added changes

6b80fd3

Merge branch 'main' of github.com:guestrin-lab/lotus into at/vs_imple…

4bafdb7

…mentation

AmoghTantradi force-pushed the at/vs_implementation branch from 66f47e3 to 4bafdb7 Compare February 9, 2025 03:29

AmoghTantradi added 6 commits February 8, 2025 19:43

added fixes

f90ff0f

added some changes

cccfa39

added some change

6cf4f0a

another set of changes

0438b18

added other logs

43e9bc3

added logging

90d07d0

liana313 reviewed Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding implementations for vector stores #79

Adding implementations for vector stores #79
AmoghTantradi wants to merge 58 commits intomainfrom
at/vs_implementation

AmoghTantradi commented Jan 14, 2025 •

edited

Loading

Uh oh!

liana313 Feb 11, 2025

Uh oh!

liana313 Feb 11, 2025

Uh oh!

liana313 Feb 11, 2025

Uh oh!

liana313 Feb 11, 2025

Uh oh!

liana313 Feb 11, 2025

Uh oh!

liana313 Feb 11, 2025

Uh oh!

liana313 Feb 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# VS Only Tests
		################################################################################

Conversation

AmoghTantradi commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liana313 Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

liana313 Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

liana313 Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

liana313 Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

liana313 Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

liana313 Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

liana313 Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AmoghTantradi commented Jan 14, 2025 •

edited

Loading