feat: v0.5.0 - Local vector storage, OpenRouter support, free-text questions, and more#8
feat: v0.5.0 - Local vector storage, OpenRouter support, free-text questions, and more#8sherlock2010song wants to merge 1 commit intohi-jin:mainfrom
Conversation
…estions, and more
There was a problem hiding this comment.
Pull request overview
This PR introduces version 0.5.0 with significant performance and usability improvements. The main enhancement is local vector storage using IndexedDB to cache embeddings, eliminating the need to re-process PDFs on subsequent queries. Additionally, the plugin now supports OpenRouter API for accessing various LLM models beyond OpenAI, and enables free-text questions without requiring specific highlight references.
- Local caching via IndexedDB significantly improves performance for repeat queries
- OpenRouter API support expands available model options
- Free-text questions enhance usability by allowing direct questions without highlights
- Progress indicators and API key verification improve user experience
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
src/utils/indexedDB.ts |
New utility file implementing IndexedDB operations for storing and retrieving vector embeddings with file hashing for cache keys |
src/settings.ts |
Added API base URL configuration, API key verification button, and cache clearing option; updated UI labels to reflect OpenRouter support |
src/openai.ts |
Implemented two-tier caching (in-memory + IndexedDB), batch processing with progress callbacks, and support for free-text questions via updated invoke signature |
src/main.ts |
Added logic to handle both highlight-based and free-text questions, integrated progress indicators for PDF processing |
package.json |
Version bumped to 0.5.0 |
package-lock.json |
Version update to match package.json changes |
README.md |
Updated documentation with OpenRouter setup instructions, free-text question usage, and v0.5.0 changelog |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!highlight) { | ||
| await logseq.UI.showMsg(`Please check whether the highlight uuid is on current line.`, "warning"); | ||
| return; | ||
| let inputData: any = null; |
There was a problem hiding this comment.
The variable name inputData with type any is too generic. Consider using a more descriptive name like questionSource or a union type like Highlight | string instead of any for better type safety.
| }); | ||
|
|
||
| // ask the model to describe the image | ||
| const apiBaseUrl = readAPIBaseUrl(); |
There was a problem hiding this comment.
apiBaseUrl is being read again here (line 227) even though it was already read at line 149. This is redundant and can be removed since the variable is already in scope from the parent function. Consider removing this line and reusing the existing apiBaseUrl variable.
| const apiBaseUrl = readAPIBaseUrl(); |
| logseq.updateSettings({ checkApiKey: false }); | ||
| const apiKey = newSettings.openaiApiKey; | ||
| if (!apiKey) { | ||
| logseq.UI.showMsg("Please enter an OpenAI API key first.", "error"); |
There was a problem hiding this comment.
The error message "Please enter an OpenAI API key first." is misleading since the field now accepts both OpenAI and OpenRouter API keys. Consider changing to "Please enter an API key first." to match the updated field description.
| logseq.UI.showMsg("Please enter an OpenAI API key first.", "error"); | |
| logseq.UI.showMsg("Please enter an API key first.", "error"); |
| const model = new ChatOpenAI({ | ||
| openAIApiKey: apiKey, | ||
| modelName: "gpt-3.5-turbo", // Use a cheap model for checking | ||
| maxTokens: 5, | ||
| configuration, | ||
| }); | ||
| await model.invoke("Hello"); |
There was a problem hiding this comment.
The API key verification always uses gpt-3.5-turbo model, which will fail for OpenRouter or non-OpenAI endpoints. When using OpenRouter, you need to specify a valid OpenRouter model name (e.g., openai/gpt-3.5-turbo). Consider using a model from settings or detecting the endpoint type based on apiBaseUrl to choose an appropriate test model.
| // MemoryVectorStore exposes memoryVectors but it's protected/private in TS definition | ||
| // We need to access it to serialize. | ||
| // In LangChain JS MemoryVectorStore, the data is stored in `memoryVectors` | ||
| // Each item has { content: string, embedding: number[], metadata: object } | ||
|
|
||
| // However, MemoryVectorStore doesn't have a direct export method that gives us everything easily for reconstruction | ||
| // without re-embedding. | ||
| // But we can access `memoryVectors` by casting to any. | ||
|
|
||
| const memoryVectors = (vectorStore as any).memoryVectors; | ||
|
|
||
| // We need to store documents and vectors. | ||
| // Actually MemoryVectorStore.fromDocuments creates vectors from documents. | ||
| // If we want to avoid re-embedding, we need to save the embeddings. | ||
| // MemoryVectorStore.fromVectors(vectors, documents, embeddings) is what we want to use for restoration. | ||
|
|
There was a problem hiding this comment.
These comments describe internal implementation details about accessing private/protected members of MemoryVectorStore. While helpful, they span multiple lines and could be more concise. Consider consolidating to: // Access memoryVectors (protected property) to serialize embeddings and avoid re-embedding on restore
| // MemoryVectorStore exposes memoryVectors but it's protected/private in TS definition | |
| // We need to access it to serialize. | |
| // In LangChain JS MemoryVectorStore, the data is stored in `memoryVectors` | |
| // Each item has { content: string, embedding: number[], metadata: object } | |
| // However, MemoryVectorStore doesn't have a direct export method that gives us everything easily for reconstruction | |
| // without re-embedding. | |
| // But we can access `memoryVectors` by casting to any. | |
| const memoryVectors = (vectorStore as any).memoryVectors; | |
| // We need to store documents and vectors. | |
| // Actually MemoryVectorStore.fromDocuments creates vectors from documents. | |
| // If we want to avoid re-embedding, we need to save the embeddings. | |
| // MemoryVectorStore.fromVectors(vectors, documents, embeddings) is what we want to use for restoration. | |
| // Access memoryVectors (protected property) to serialize embeddings and avoid re-embedding on restore | |
| const memoryVectors = (vectorStore as any).memoryVectors; | |
| > [!Note] | ||
| > You can move your cursor or continue working in Logseq while the response is being generated. |
There was a problem hiding this comment.
The Note block at lines 67-68 appears to be duplicated from lines 54-55. Consider removing this duplication to keep the documentation clean.
| > [!Note] | |
| > You can move your cursor or continue working in Logseq while the response is being generated. |
| console.error("IndexedDB error:", event); | ||
| reject("IndexedDB error"); |
There was a problem hiding this comment.
The error message "IndexedDB error" is not descriptive enough. Consider including the actual error from the event: reject(new Error("IndexedDB error: " + (event.target as any)?.error?.message)) to help with debugging.
| console.error("IndexedDB error:", event); | |
| reject("IndexedDB error"); | |
| const errorMsg = (event.target as any)?.error?.message || ""; | |
| console.error("IndexedDB error:", event); | |
| reject(new Error("IndexedDB error: " + errorMsg)); |
| const db = await openDB(); | ||
| return new Promise((resolve, reject) => { | ||
| const transaction = db.transaction([STORE_NAME], "readwrite"); | ||
| const store = transaction.objectStore(STORE_NAME); | ||
|
|
||
| // MemoryVectorStore exposes memoryVectors but it's protected/private in TS definition | ||
| // We need to access it to serialize. | ||
| // In LangChain JS MemoryVectorStore, the data is stored in `memoryVectors` | ||
| // Each item has { content: string, embedding: number[], metadata: object } | ||
|
|
||
| // However, MemoryVectorStore doesn't have a direct export method that gives us everything easily for reconstruction | ||
| // without re-embedding. | ||
| // But we can access `memoryVectors` by casting to any. | ||
|
|
||
| const memoryVectors = (vectorStore as any).memoryVectors; | ||
|
|
||
| // We need to store documents and vectors. | ||
| // Actually MemoryVectorStore.fromDocuments creates vectors from documents. | ||
| // If we want to avoid re-embedding, we need to save the embeddings. | ||
| // MemoryVectorStore.fromVectors(vectors, documents, embeddings) is what we want to use for restoration. | ||
|
|
||
| const vectors = memoryVectors.map((v: any) => v.embedding); | ||
| const documents = memoryVectors.map((v: any) => new Document({ | ||
| pageContent: v.content, | ||
| metadata: v.metadata, | ||
| })); | ||
|
|
||
| const data: StoredVectorStore = { | ||
| id: key, | ||
| vectors, | ||
| documents, | ||
| timestamp: Date.now(), | ||
| }; | ||
|
|
||
| const request = store.put(data); | ||
|
|
||
| request.onsuccess = () => resolve(); | ||
| request.onerror = () => reject(request.error); | ||
| }); |
There was a problem hiding this comment.
The database connection opened in openDB() is not being closed after operations complete. Consider adding db.close() calls after transactions finish in each function (saveVectorStoreToDB, loadVectorStoreFromDB, clearVectorStoreFromDB, clearAllVectorStoresFromDB) to avoid keeping unnecessary connections open.
| const loader = new PDFLoader(pdf, { | ||
| pdfjs: () => pdfjs as any, | ||
| }); | ||
| const docs = await loader.load() |
There was a problem hiding this comment.
Avoid automated semicolon insertion (97% of all statements in the enclosing function have an explicit semicolon).
| const docs = await loader.load() | |
| const docs = await loader.load(); |
|
Thank you for your pull request. What you added are really required features for this repo. |
Changelog
v0.5.0