Skip to content

feat: v0.5.0 - Local vector storage, OpenRouter support, free-text questions, and more#8

Open
sherlock2010song wants to merge 1 commit intohi-jin:mainfrom
sherlock2010song:main
Open

feat: v0.5.0 - Local vector storage, OpenRouter support, free-text questions, and more#8
sherlock2010song wants to merge 1 commit intohi-jin:mainfrom
sherlock2010song:main

Conversation

@sherlock2010song
Copy link
Copy Markdown

Changelog

v0.5.0

  • Local Vector Storage: Optimized performance by caching vector embeddings locally using IndexedDB. This significantly speeds up subsequent queries for the same PDF.
  • OpenRouter Support: Added support for OpenRouter API, allowing you to use various models like Claude 3, Llama 3, etc.
  • Free-text Questions: You can now ask questions directly in a block without needing to reference a specific highlight.
  • Progress Indicator: Added a real-time progress indicator for the PDF embedding process.
  • API Key Verification: Added a button in settings to verify your API key and connection.
  • Clear Cache: Added an option to clear the local vector store cache.

Copilot AI review requested due to automatic review settings December 9, 2025 14:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces version 0.5.0 with significant performance and usability improvements. The main enhancement is local vector storage using IndexedDB to cache embeddings, eliminating the need to re-process PDFs on subsequent queries. Additionally, the plugin now supports OpenRouter API for accessing various LLM models beyond OpenAI, and enables free-text questions without requiring specific highlight references.

  • Local caching via IndexedDB significantly improves performance for repeat queries
  • OpenRouter API support expands available model options
  • Free-text questions enhance usability by allowing direct questions without highlights
  • Progress indicators and API key verification improve user experience

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/utils/indexedDB.ts New utility file implementing IndexedDB operations for storing and retrieving vector embeddings with file hashing for cache keys
src/settings.ts Added API base URL configuration, API key verification button, and cache clearing option; updated UI labels to reflect OpenRouter support
src/openai.ts Implemented two-tier caching (in-memory + IndexedDB), batch processing with progress callbacks, and support for free-text questions via updated invoke signature
src/main.ts Added logic to handle both highlight-based and free-text questions, integrated progress indicators for PDF processing
package.json Version bumped to 0.5.0
package-lock.json Version update to match package.json changes
README.md Updated documentation with OpenRouter setup instructions, free-text question usage, and v0.5.0 changelog

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if (!highlight) {
await logseq.UI.showMsg(`Please check whether the highlight uuid is on current line.`, "warning");
return;
let inputData: any = null;
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name inputData with type any is too generic. Consider using a more descriptive name like questionSource or a union type like Highlight | string instead of any for better type safety.

Copilot uses AI. Check for mistakes.
});

// ask the model to describe the image
const apiBaseUrl = readAPIBaseUrl();
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiBaseUrl is being read again here (line 227) even though it was already read at line 149. This is redundant and can be removed since the variable is already in scope from the parent function. Consider removing this line and reusing the existing apiBaseUrl variable.

Suggested change
const apiBaseUrl = readAPIBaseUrl();

Copilot uses AI. Check for mistakes.
logseq.updateSettings({ checkApiKey: false });
const apiKey = newSettings.openaiApiKey;
if (!apiKey) {
logseq.UI.showMsg("Please enter an OpenAI API key first.", "error");
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message "Please enter an OpenAI API key first." is misleading since the field now accepts both OpenAI and OpenRouter API keys. Consider changing to "Please enter an API key first." to match the updated field description.

Suggested change
logseq.UI.showMsg("Please enter an OpenAI API key first.", "error");
logseq.UI.showMsg("Please enter an API key first.", "error");

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +109
const model = new ChatOpenAI({
openAIApiKey: apiKey,
modelName: "gpt-3.5-turbo", // Use a cheap model for checking
maxTokens: 5,
configuration,
});
await model.invoke("Hello");
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API key verification always uses gpt-3.5-turbo model, which will fail for OpenRouter or non-OpenAI endpoints. When using OpenRouter, you need to specify a valid OpenRouter model name (e.g., openai/gpt-3.5-turbo). Consider using a model from settings or detecting the endpoint type based on apiBaseUrl to choose an appropriate test model.

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +61
// MemoryVectorStore exposes memoryVectors but it's protected/private in TS definition
// We need to access it to serialize.
// In LangChain JS MemoryVectorStore, the data is stored in `memoryVectors`
// Each item has { content: string, embedding: number[], metadata: object }

// However, MemoryVectorStore doesn't have a direct export method that gives us everything easily for reconstruction
// without re-embedding.
// But we can access `memoryVectors` by casting to any.

const memoryVectors = (vectorStore as any).memoryVectors;

// We need to store documents and vectors.
// Actually MemoryVectorStore.fromDocuments creates vectors from documents.
// If we want to avoid re-embedding, we need to save the embeddings.
// MemoryVectorStore.fromVectors(vectors, documents, embeddings) is what we want to use for restoration.

Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments describe internal implementation details about accessing private/protected members of MemoryVectorStore. While helpful, they span multiple lines and could be more concise. Consider consolidating to: // Access memoryVectors (protected property) to serialize embeddings and avoid re-embedding on restore

Suggested change
// MemoryVectorStore exposes memoryVectors but it's protected/private in TS definition
// We need to access it to serialize.
// In LangChain JS MemoryVectorStore, the data is stored in `memoryVectors`
// Each item has { content: string, embedding: number[], metadata: object }
// However, MemoryVectorStore doesn't have a direct export method that gives us everything easily for reconstruction
// without re-embedding.
// But we can access `memoryVectors` by casting to any.
const memoryVectors = (vectorStore as any).memoryVectors;
// We need to store documents and vectors.
// Actually MemoryVectorStore.fromDocuments creates vectors from documents.
// If we want to avoid re-embedding, we need to save the embeddings.
// MemoryVectorStore.fromVectors(vectors, documents, embeddings) is what we want to use for restoration.
// Access memoryVectors (protected property) to serialize embeddings and avoid re-embedding on restore
const memoryVectors = (vectorStore as any).memoryVectors;

Copilot uses AI. Check for mistakes.
Comment on lines 67 to 68
> [!Note]
> You can move your cursor or continue working in Logseq while the response is being generated.
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Note block at lines 67-68 appears to be duplicated from lines 54-55. Consider removing this duplication to keep the documentation clean.

Suggested change
> [!Note]
> You can move your cursor or continue working in Logseq while the response is being generated.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
console.error("IndexedDB error:", event);
reject("IndexedDB error");
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message "IndexedDB error" is not descriptive enough. Consider including the actual error from the event: reject(new Error("IndexedDB error: " + (event.target as any)?.error?.message)) to help with debugging.

Suggested change
console.error("IndexedDB error:", event);
reject("IndexedDB error");
const errorMsg = (event.target as any)?.error?.message || "";
console.error("IndexedDB error:", event);
reject(new Error("IndexedDB error: " + errorMsg));

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +79
const db = await openDB();
return new Promise((resolve, reject) => {
const transaction = db.transaction([STORE_NAME], "readwrite");
const store = transaction.objectStore(STORE_NAME);

// MemoryVectorStore exposes memoryVectors but it's protected/private in TS definition
// We need to access it to serialize.
// In LangChain JS MemoryVectorStore, the data is stored in `memoryVectors`
// Each item has { content: string, embedding: number[], metadata: object }

// However, MemoryVectorStore doesn't have a direct export method that gives us everything easily for reconstruction
// without re-embedding.
// But we can access `memoryVectors` by casting to any.

const memoryVectors = (vectorStore as any).memoryVectors;

// We need to store documents and vectors.
// Actually MemoryVectorStore.fromDocuments creates vectors from documents.
// If we want to avoid re-embedding, we need to save the embeddings.
// MemoryVectorStore.fromVectors(vectors, documents, embeddings) is what we want to use for restoration.

const vectors = memoryVectors.map((v: any) => v.embedding);
const documents = memoryVectors.map((v: any) => new Document({
pageContent: v.content,
metadata: v.metadata,
}));

const data: StoredVectorStore = {
id: key,
vectors,
documents,
timestamp: Date.now(),
};

const request = store.put(data);

request.onsuccess = () => resolve();
request.onerror = () => reject(request.error);
});
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The database connection opened in openDB() is not being closed after operations complete. Consider adding db.close() calls after transactions finish in each function (saveVectorStoreToDB, loadVectorStoreFromDB, clearVectorStoreFromDB, clearAllVectorStoresFromDB) to avoid keeping unnecessary connections open.

Copilot uses AI. Check for mistakes.
const loader = new PDFLoader(pdf, {
pdfjs: () => pdfjs as any,
});
const docs = await loader.load()
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid automated semicolon insertion (97% of all statements in the enclosing function have an explicit semicolon).

Suggested change
const docs = await loader.load()
const docs = await loader.load();

Copilot uses AI. Check for mistakes.
@hi-jin
Copy link
Copy Markdown
Owner

hi-jin commented Dec 10, 2025

Thank you for your pull request. What you added are really required features for this repo.
I may have time to review this PR at this weekend (maybe..?)
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants