Skip to content

CLI mode for lazybookmarks#2

Draft
corv89 wants to merge 26 commits intoLLMCoolJ:releasefrom
corv89:release
Draft

CLI mode for lazybookmarks#2
corv89 wants to merge 26 commits intoLLMCoolJ:releasefrom
corv89:release

Conversation

@corv89
Copy link
Copy Markdown

@corv89 corv89 commented Apr 5, 2026

Standalone CLI

  • 12 subcommands: import, organise, list, search, undo, dedup, check-links, model-list, model-set, model-download, status, doctor
  • Works with any OpenAI-compatible endpoint (default: local ollama). LLM_URL env var for power users to point at a remote LLM.
  • Preserves the original 3-phase AI pipeline (taxonomy → clustering → per-bookmark classification) with pruneTaxonomy and skip handling
  • Supports 4 model sizes: qwen3.5:0.8b, qwen3.5:2b (default), qwen3.5:4b, gemma4:e2b — with simplified schemas and few-shot examples for the 0.8b variant
  • CI/release workflows for Linux + macOS across x86_64 and arm64
  • Single static binary, no runtime dependencies beyond ollama + curl

New features (CLI-only, non-LLM)

Duplicate detection (dedup)

  • URL normalization: strips tracking params (utm_*, fbclid, gclid, etc.) and trailing slashes
  • Two confidence levels: normalized URL match (high) and same-domain + same-title match (medium)
  • Interactive review mode per group

Dead link detection (check-links)

  • Parallelized HEAD requests via curl + xargs (configurable concurrency, default 8)
  • Follows redirects (up to 5 hops), filters tracking-param-only redirects
  • Falls back HEAD→GET on 405/501/403. Status: alive / dead / redirected / unknown

Performance optimizations

  • Async sliding-window concurrency for Phase 2 classification (default 4) — roughly 4× faster
  • Batch size auto-tuned by model size (5 small, 10 normal)
  • Few-shot JSON examples in prompts to improve first-attempt parse rate

Extension takeaways

Dedup and dead-link detection are pure HTTP/SQL — no LLM needed — and would port easily into the extension's service worker. The concurrency model could similarly speed up classification using Promise.allSettled with a sliding window.

corv89 added 26 commits April 5, 2026 01:46
Port Chrome extension to standalone Nim binary using local LLM runtime.
Includes Ollama registry integration (qwen3.5-0.8b/2b/4b, gemma4-e2b),
3-phase AI bookmark organizer, SQLite storage, and cligen CLI with
10 subcommands. Downloads use curl --progress-bar with -C - for
resumable partial downloads.
Replace self-managed llama-server + GGUF model downloads with ollama
as the LLM backend. Ollama handles model management (pull/list) and
serves an OpenAI-compatible API, eliminating ~150 lines of download
progress, tarball extraction, and SHA256 verification code.

- runtime.nim: ollama serve/stop/health via shell + HTTP checks
- model.nim: ollama pull/list API, remove nimcrypto/sha256/curl download
- config.nim: default endpoint now 127.0.0.1:11434/v1, add modelName
- client.nim: send actual model name instead of hardcoded 'local'
- bootstrap.nim: simplified ensureReady (start ollama, pull model)
- models.json: stripped digest/sizeBytes, just name/ollamaModel/ollamaTag
- nimble: removed nimcrypto dependency (binary ~700KB, down from ~900KB)
…agement

- Fix {refStr} not interpolated in model pull messages
- Use string concat (&) instead of path join (/) for API URLs
- Remove spawn/poll/stop ollama lifecycle (treat as external service)
- Add requireRuntime with platform-specific start/install hints
- Remove unused pidFilePath/logFilePath/logsDir from config
- Increase retry backoff delay for slow model loading
- Add options.think=false to suppress thinking tags in qwen3.5
- Strip thinking tags, system-reminder tags before JSON extraction
- Increase max_tokens to 2048
- Include raw response in error messages for debugging
- Log model name and message previews in verbose mode
- Switch default from qwen3.5-0.8b to qwen3.5-2b for more reliable output
Use the  parameter for schema enforcement at the inference layer
(llama.cpp grammar-based constrained decoding) instead of the OpenAI-
compatible /v1/chat/completions endpoint's post-hoc response_format hint.
This fixes the small model (qwen3.5:0.8b) returning non-JSON responses.

- client.nim: native /api/chat with  param for Ollama, fall back
  to OpenAI /v1/chat/completions when LLM_URL is set (runtimeManaged=false)
- config.nim: default URL changed to native base (no /v1 suffix), strip
  trailing slashes in ollamaApiUrl(), remove unused readTomlInt
- organizer.nim: use full schemas for all model sizes (constrained
  decoding eliminates the need for simplified small-model schemas)
Outputs bookmarks grouped by AI-assigned category, with unorganized
bookmarks in an 'Unorganized' folder. Supports --output for file output
and --category for filtering a single folder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant