YOUR RESEARCH — YOUR DATA
Manage your publications library, ask questions, and get summaries — all on your own computer.
Connect a local LLM (for example llama.cpp) and
ResearchBlockscan give you smarter answers, summaries, citations, and more.ResearchBlocksdoes not require an internet connection to run your library and local features.
Author: Jude Janitha Niroshan · (Bukkbeek)
- Free and open source under the MIT License.
- You self-host
ResearchBlocks(basic localhost use is enough; setup steps are below). - An LLM is optional;
ResearchBlocksfunctions fully for library, search, BibTeX, and extractive features without one. Natural-language answers and enrichment need a connected LLM.
- Researchers and students who want a private library of research publications.
- Anyone tired of juggling folders, citation managers, and ad hoc search.
- People who want plain-language questions powered by an LLM (when you add one).
- Anyone who wants a personal research assistant that only sees your data.
You do not need to be a programmer — only follow the setup steps once.
| You want to… | ResearchBlocks helps by… |
|---|---|
| Keep papers in one place | Library with search, tags, and optional PDF storage |
| Import PDFs quickly | Add Data — scan a folder or Browse (Chrome/Edge) to upload many PDFs at once |
| Organize by topic | Hierarchy — categories like Biology, Medicine, Agriculture, plus Discovered topics from your keywords |
| Link formal citations | BibTeX — import .bib, link entries to papers, export |
| Ask “what did my papers say about X?” | Query — finds relevant passages and (with an LLM) writes a short cited answer |
| Skim a long paper | Summary on each paper (extractive; richer summaries with Generate + LLM) |
| Fix messy metadata | Generate on a paper — cleanup, abstract and authors (with verification), keywords, hierarchy, summary |
| Start over | Settings → cleanup database (removes all local data) |
Without an LLM, you still get search, hierarchy, BibTeX, extractive summaries, and passage-based query results.
With an LLM, you get natural-language answers, smarter tagging, and automated enrichment.
- A computer — Windows, macOS, or Linux
- Node.js — v18 or newer (the LTS installer is fine)
- This repository — clone with Git or download the ZIP from GitHub
- (Optional, recommended for full assistant features) Local LLM — an OpenAI-compatible server such as llama.cpp
llama-serveron port 8080 (see below)
| OS | How to install |
|---|---|
| Windows | Download the LTS .msi from nodejs.org. Restart the terminal, then run node -v (should be ≥ 18). |
| macOS | Use the LTS pkg from nodejs.org, or Homebrew: brew install node, or a version manager like nvm. |
| Linux (Debian/Ubuntu) | Example with NodeSource (Node 20): curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - then sudo apt install -y nodejs. Or use your distro’s packages if they meet the version requirement. |
Verify:
node -v
npm -vGit:
git clone https://github.com/bukkbeek/ResearchBlocks.git
cd ResearchBlocks(Use your fork URL if you forked the repo.)
Without Git: On GitHub, use Code → Download ZIP, extract it, and open a terminal in the extracted folder.
npm installThis installs the libraries the app needs (web server, PDF text, BibTeX parsing, etc.). Run this once per copy of the project.
npm startYou should see that the server is running on port 3000.
Go to http://localhost:3000.
Important: Always use the address above. Do not open public/index.html with Live Server or by double-clicking the file — the app needs the Node server.
Everything lives under data/db/ in the project folder (papers as JSON, optional PDF copies, BibTeX imports). That folder is created automatically. Back it up if you care about your library.
ResearchBlocks expects an OpenAI-compatible HTTP API. Many people use llama.cpp llama-server.
The steps below are a minimum-viable setup (small model, moderate context, CPU-friendly). If your machine has more RAM, VRAM, or GPU, you can scale the model and context—see Upgrading the LLM and context window.
Open llama.cpp Releases and download the archive that matches your OS and CPU, for example:
- Windows —
llama-*-bin-win-*.zip - macOS —
llama-*-bin-macos-*.zip(choose arm64 or x64 as appropriate) - Linux x64 — e.g.
llama-*-bin-ubuntu-x64.tar.gz
Extract it somewhere convenient (e.g. ~/llm/llama-… on Mac/Linux, C:\llm\llama-… on Windows). The llama-server binary is usually in the root of the extracted folder (not always under bin/).
Example GGUF (Q4_K_M, ~2.1 GB) — lightweight default for laptops and low RAM; easy to run, still useful for summaries and queries:
# macOS / Linux (curl)
curl -L -o qwen2.5-3b-q4.gguf "https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF/resolve/main/qwen2.5-3b-instruct-q4_k_m.gguf"On Windows, you can paste the same URL into a browser or use PowerShell Invoke-WebRequest to save the file in the same folder as llama-server.exe.
macOS / Linux (from the folder that contains llama-server and the .gguf):
./llama-server -m qwen2.5-3b-q4.gguf -c 4096 -t 4 --host 127.0.0.1 --port 8080Windows (PowerShell or Command Prompt, same folder as llama-server.exe):
.\llama-server.exe -m qwen2.5-3b-q4.gguf -c 4096 -t 4 --host 127.0.0.1 --port 8080-c 4096— context length (ResearchBlocksbenefits from a larger window when your hardware allows).-t N— CPU threads; setNto roughly your physical cores.--host 127.0.0.1— only this PC can reach the LLM (safest default). Use0.0.0.0only if you intentionally want other devices on your LAN to call the LLM (see LAN access).
Wait until the log shows the server is listening. Quick check:
curl http://127.0.0.1:8080/health(On Windows without curl, open http://127.0.0.1:8080/health in a browser.)
By default, ResearchBlocks uses http://127.0.0.1:8080. In a second terminal, from the project folder:
npm startIf the LLM runs on another host or port:
macOS / Linux (bash):
export LLM_URL=http://192.168.0.10:8080
npm startWindows PowerShell:
$env:LLM_URL = "http://192.168.0.10:8080"
npm startWindows CMD:
set LLM_URL=http://192.168.0.10:8080
npm startWhen the connection works, an LLM badge appears in the sidebar.
On modest hardware, answers use a compact context. If your model has a larger context window, you can allow more characters:
bash / macOS / Linux:
LLM_QUERY_CONTEXT_CHARS=3500 npm startPowerShell:
$env:LLM_QUERY_CONTEXT_CHARS = "3500"
npm startWhen the LLM is connected:
- Query → Ask a Question — natural-language answers with citations from your papers.
- Import PDFs — options such as extracting abstracts with the LLM, sorting keywords, and auto-matching BibTeX when matches are ambiguous.
- Paper detail → Keywords — “Sort keywords with LLM” for existing papers.
Everything above is tuned for least hardware: a ~3B quant, 4096 tokens of context in llama-server, and ResearchBlocks sending a relatively small passage budget to the model by default (LLM_QUERY_CONTEXT_CHARS defaults to 1400 characters when unset—roughly aligned with ~2K-token-class usage; the README example uses 3500 for ~4K-class servers).
If you have headroom, you can push quality and depth in three places: the model, the server context (-c), and how much text ResearchBlocks feeds into answers (LLM_QUERY_CONTEXT_CHARS).
- Browse Hugging Face GGUF collections (or a specific family such as Qwen, Llama, Mistral, Gemma) and pick a larger instruct model: 7B, 14B, 32B, and beyond train better reasoning and writing; 70B+ and MoE models are viable if you have enough unified RAM or VRAM (and patience on CPU).
- Use the same
llama-server -m your-model.ggufflow; only the file path and hardware requirements change. - Prefer GPU builds of llama.cpp when you can (CUDA on NVIDIA, Metal on Apple Silicon)—see the same releases page for variants that match your OS and accelerator.
- Quantization: Q4_K_M stays efficient; Q5_K_M / Q8_0 (or similar) can improve quality if you have spare memory.
- Raise
-cto what the model supports (check the model card): 8192, 16384, 32768, or more on long-context checkpoints. - More context = more RAM/VRAM and often slower generation; if the process is killed or swaps badly, use a smaller
-c, a smaller model, or a smaller quant. - Keep
-cin sync with reality: there is no benefit setting it far above the model’s effective training context.
The app limits how many characters of retrieved passages go into the LLM for Query / synthesis (see lib/llm.js — LLM_QUERY_CONTEXT_CHARS). Rough guidance:
Server context (-c, tokens) |
Example LLM_QUERY_CONTEXT_CHARS |
|---|---|
| ~2048 | 1400 (default) or leave unset |
| ~4096 | 3000–4000 |
| ~8192+ | 6000–12000+ (increase until answers improve or you hit latency / OOM) |
Set it when starting the app (same patterns as above):
LLM_QUERY_CONTEXT_CHARS=8000 npm startThere is no single “correct” number—tokens ≠ characters, so treat this as tune until quality and speed feel right.
- Run the largest GGUF you can load (multi-GPU, high-RAM workstations, or a dedicated inference machine on your LAN with
LLM_URLpointing to it). - Point
ResearchBlocksat any OpenAI-compatible endpoint (local or self-hosted stack)—not only llama.cpp—as long as it speaks the same HTTP API your setup expects.
To use ResearchBlocks from a phone or another PC on the same Wi‑Fi, you need the host computer’s LAN IP (for example 192.168.1.42).
- Windows:
ipconfig→ IPv4 Address - macOS / Linux:
ip addr,ifconfig, or System Settings → Network
Open http://<that-ip>:3000 on the other device.
If it does not load:
- Windows: Allow Node.js or port 3000 in Windows Defender Firewall when prompted, or add an inbound rule for TCP 3000.
- Linux (ufw):
sudo ufw allow 3000/tcp && sudo ufw reload - macOS: System Settings → Network → Firewall → Options and allow incoming for Node/your terminal app if needed.
Security: Exposing the app or the LLM on 0.0.0.0 increases risk. Prefer VPN or SSH tunneling for remote access; if you expose services to the internet, use HTTPS, a firewall, and proper access control.
On systemd-based distros you can run the LLM and ResearchBlocks as services.
LLM unit (/etc/systemd/system/researchblocks-llm.service) — adjust User, paths, and the llama-server line for your install:
[Unit]
Description=ResearchBlocks LLM (llama-server)
After=network.target
[Service]
Type=simple
User=YOUR_USER
WorkingDirectory=/home/YOUR_USER/llm/YOUR_LLAMA_FOLDER
ExecStart=/home/YOUR_USER/llm/YOUR_LLAMA_FOLDER/llama-server -m /home/YOUR_USER/llm/YOUR_LLAMA_FOLDER/qwen2.5-3b-q4.gguf -c 4096 -t 2 --host 127.0.0.1 --port 8080
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.targetResearchBlocks unit (/etc/systemd/system/researchblocks.service):
[Unit]
Description=ResearchBlocks
After=network.target researchblocks-llm.service
[Service]
Type=simple
User=YOUR_USER
WorkingDirectory=/home/YOUR_USER/ResearchBlocks
ExecStart=/usr/bin/node /home/YOUR_USER/ResearchBlocks/server.js
Restart=on-failure
RestartSec=5
Environment=NODE_ENV=production
Environment=LLM_URL=http://127.0.0.1:8080
[Install]
WantedBy=multi-user.targetThen:
sudo systemctl daemon-reload
sudo systemctl enable researchblocks-llm researchblocks
sudo systemctl start researchblocks-llm researchblocksIf you have about 4 GB RAM and run a 3B model, extra swap can reduce out-of-memory kills (at the cost of speed):
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab| Issue | What to try |
|---|---|
llama-server not found |
Run the command from the extracted release folder where the binary lives. |
| Port already in use | Change the port for llama-server or stop the other app. Check: Linux/macOS lsof -i :3000 / :8080; Windows netstat -ano | findstr :3000. |
| LLM slow or OOM | Lower -c (e.g. 2048), reduce -t, use a smaller quant, or add swap (Linux). |
| No LLM badge | Confirm llama-server is running and LLM_URL matches; test /health on the LLM port. |
| Reset everything | Settings → Cleanup database (removes local library data). |
- Import: Add Data → folder or Browse → choose options (abstracts, keywords, BibTeX match) → Import.
- Questions: Query → type a full question → optional topic filter → read the answer and open sources.
- One paper needs work? Open it → Generate (next to Edit) to re-run enrichment with the LLM.
- BibTeX: Import under BibTeX, then link from each paper or use auto-match on import.
| Topic | Details |
|---|---|
| Stack | Node.js, Express, static frontend, file-based data/db/ |
| LLM module | lib/llm.js — chat, abstract, authors, verify, keywords, hierarchy hints, answer synthesis |
| Design notes | docs/LLM-INTEGRATION-REPORT.md |
ResearchBlocks/
├── server.js
├── lib/llm.js
├── public/ # UI
├── data/db/ # your database (default)
└── docs/
Licensed under the MIT License. Contributions, issues, and forks are welcome.
| Project | Bukkbeek |
| Development | Built with Cursor |
| pdf-parse | |
| BibTeX | bibtex-parse |
| Optional LLM | llama.cpp, GGUF models |
Your research. Your data. Your personal research assistant — on your terms.
