BMLibrarian

The Biomedical Researcher's AI Workbench

BMLibrarian is a comprehensive AI-powered platform designed to be a complete workbench for biomedical researchers, clinicians, and systematic reviewers. It provides evidence-based answers to clinical questions, peer-review quality automated assessment of research papers, and systematic fact-checking of biomedical statements—all powered by local AI models requiring no cloud APIs or external services.

Why BMLibrarian?

Evidence-Based Answers to Clinical Questions

Ask questions like "What are the cardiovascular benefits of exercise?" or "Does metformin reduce mortality in diabetic patients?" and receive comprehensive, citation-backed reports synthesizing evidence from the latest biomedical literature.

Automated Research Quality Assessment

Evaluate research papers with the rigor of peer review:

Paper Weight Assessment: Evaluate the evidential weight of studies based on study design, sample size, methodological quality, and risk of bias
PRISMA 2020 Compliance: Assess systematic reviews against the 27-item PRISMA 2020 checklist
PICO Extraction: Automatically extract Population, Intervention, Comparison, and Outcome components for systematic reviews

Robust Fact-Checking

Validate biomedical statements with literature evidence:

Statement Fact-Checker: Evaluate claims like "Vaccine X causes Y" against published literature
PaperChecker: Validate research abstract claims by systematically searching for contradictory evidence
Counterfactual Analysis: Actively search for evidence that contradicts initial findings for balanced conclusions

Works Offline—Critical for Global Health

BMLibrarian is designed for clinicians and researchers working in areas with limited or unreliable internet connectivity:

Runs entirely with local AI models via Ollama—no cloud APIs required
Local database of PubMed and medRxiv publications with full-text PDFs where available
No API keys, subscriptions, or external services needed
Periodic synchronization with PubMed and medRxiv when connected
Complete functionality offline after initial data import

This makes BMLibrarian uniquely valuable for healthcare workers in remote regions, field hospitals, developing nations, or any environment where reliable internet cannot be guaranteed.

Multiple Search Strategies with AI Assistance

BMLibrarian employs sophisticated multi-strategy search capabilities:

Multi-model query generation: Use multiple AI models to generate diverse database queries
Semantic search: Vector-based similarity search using document embeddings
HyDE (Hypothetical Document Embeddings): Generate hypothetical answers to improve search relevance
Keyword extraction: Traditional keyword-based search as fallback
Counterfactual search: Actively search for contradictory evidence

Privacy-Preserving AI

All AI processing happens locally on your hardware:

No data leaves your machine—perfect for sensitive patient data or pre-publication research
No usage tracking or telemetry
Complete control over model selection and parameters

What's New

Latest Features (02/2026):

Paper Reviewer Lab

A comprehensive paper assessment tool that combines all of BMLibrarian's analysis agents into a single unified workflow. Accepts input via DOI, PMID, PDF file, or pasted text.

# Launch the Paper Reviewer Lab
uv run python scripts/paper_reviewer_lab.py

11-Step Assessment Workflow:

Resolve Input (fetch document metadata via DOI/PMID/PDF/text)
Generate Summary (2-3 sentence synopsis)
Extract Hypothesis (identify core claims)
Detect Study Type (classify research methodology)
PICO Analysis (Population/Intervention/Comparison/Outcome)
PRISMA 2020 Assessment (systematic review checklist, if applicable)
Paper Weight Assessment (evidential weight scoring)
Study Quality Assessment (trustworthiness evaluation)
Synthesize Strengths/Weaknesses
Search Contradictory Evidence (optional PubMed search)
Compile Comprehensive Report

Key Features:

Multiple input methods: DOI, PMID, PDF file, pasted abstract text
Real-time workflow progress visualization (PySide6/Qt)
Model selection from available Ollama models
Results display in Markdown and JSON
Export to Markdown, PDF, or JSON

Systematic Literature Review Agent

A complete systematic review automation system with human oversight and audit trails. Conducts AI-assisted literature reviews following PRISMA 2020 guidelines with configurable search strategies, quality assessment, and composite scoring.

# Run a systematic review
uv run python systematic_review_cli.py --question "Effect of statins on CVD prevention" \
    --include "RCTs" "Human studies" --exclude "Animal studies"

# GUI with checkpoint-based resume capability
uv run python systematic_review_gui.py

Key Capabilities:

Multi-strategy search: Semantic, keyword, hybrid, and HyDE queries with PICO analysis
9-phase workflow: Search planning, execution, filtering, scoring, quality assessment, composite scoring, classification, evidence synthesis, reporting
Cochrane/GRADE assessment: Integrated quality assessment with GRADE formatting
Checkpoint-based resume: Save and resume reviews across sessions
Human checkpoints: Interactive mode pauses at key decision points for human review
Quality assessment: Integrates StudyAssessmentAgent, PaperWeightAssessmentAgent, PICOAgent, and PRISMA2020Agent
Complete audit trail: Full reproducibility with JSON, Markdown, CSV, and PRISMA flow diagram outputs
Configurable weights: Customize relevance, quality, recency, and source reliability weights

Europe PMC Full-Text and PDF Import

Download and import full-text articles and PDFs from Europe PMC's Open Access repository.

# List available Europe PMC packages (~1000+ files, ~100 articles each)
uv run python europe_pmc_bulk_cli.py list

# Download and import full-text XML with Markdown conversion
uv run python europe_pmc_bulk_cli.py sync --output-dir ~/europepmc

# Download Open Access PDFs
uv run python europe_pmc_pdf_cli.py download --output-dir ~/europepmc_pdf

Key Features:

JATS XML to Markdown conversion (headers, figures, tables, emphasis)
Resumable downloads with progress tracking
Configurable rate limiting (polite mode)
Year-based PDF organization
PMCID range filtering

PubMed Search Lab

Interactive PubMed search directly via the PubMed API, without requiring a local database.

uv run python scripts/pubmed_search_lab.py

Key Features:

Natural language to PubMed query conversion
MeSH term lookup and expansion
Search results display with abstracts
No local database required

Audit Trail Validation GUI

A human review interface for validating automated evaluations in the systematic review audit trail.

uv run python audit_validation_gui.py --user alice
uv run python audit_validation_gui.py --user alice --incremental

Key Features:

Tab-per-step organization for Queries, Scores, Citations, Reports, and Counterfactuals
Validation statuses: Validated, Incorrect, Uncertain, or Needs Review
Error categorization with 25+ predefined categories
Statistics dashboard with reviewer performance tracking
Multi-reviewer support for inter-rater reliability studies

Citation-Aware Writing Editor

A markdown editor with integrated citation management for academic writing.

Key Features:

Citation markers with [@id:12345:Smith2023] format
Semantic search for finding references
Multiple citation styles: Vancouver, APA, Harvard, Chicago
Autosave with version history
Export with formatted reference lists
PostgreSQL-backed document persistence

Other Recent Features

Paper Weight Assessment: Evaluate research papers across five quality dimensions (study design, sample size, methodology, bias risk, replication)
PICO Extraction: Automatically extract Population, Intervention, Comparison, and Outcome for systematic reviews
PRISMA 2020 Compliance: Assess systematic reviews against the full 27-item PRISMA 2020 checklist
Document Interrogation: Interactive Q&A interface for asking questions about loaded PDF, Markdown, or text documents
Full-Text PDF Discovery: Automated discovery and download from PMC, Unpaywall, DOI resolution, and OpenAthens
Transparency Assessment: Detect undisclosed bias risk by evaluating funding disclosure, COI statements, data availability, trial registration, and author contributions (CLI, Lab GUI, bulk metadata enrichment)
PaperChecker System: Fact-check medical abstracts by searching for contradictory literature evidence
Fact Checker System: LLM training data auditing with literature validation (CLI, desktop GUI, blind mode, incremental mode, SQLite integration)
Multi-Model Query Generation: Use up to 3 AI models simultaneously for 20-40% more relevant documents
Semantic Chunking: Multiple chunking strategies (adaptive, sentence-based, SpaCy NLP) with vector embeddings for improved retrieval
LLM Provider Abstraction: Unified interface across multiple LLM providers with token tracking
Thesaurus/MeSH Expansion: Term expansion and synonym lookup for improved search coverage
User Authentication: Login system with per-user database-backed settings
PubMed Download Repair: CLI for detecting and fixing corrupted gzip files in bulk downloads
PostgreSQL Audit Trail: Complete persistent tracking of research workflow sessions
Automatic Database Migrations: Zero-configuration schema updates on startup

Overview

BMLibrarian transforms how researchers interact with biomedical literature by combining AI-powered natural language processing with robust database infrastructure. The system employs multiple specialized AI agents that work together to convert research questions into comprehensive, evidence-based medical reports with proper citations and balanced analysis of contradictory evidence.

ARCHITECTURAL SCALE

Codebase Statistics

728 Python files organized in hierarchical module structure
1,390 classes implementing specialized functionality
9,671 functions providing granular capabilities
~211,000 lines of code (excluding comments, docstrings, and blank lines; ~298,000 total)
~8,800 docstrings for comprehensive documentation
145 test files with comprehensive test coverage
272 documentation files (Markdown)
100% type hints for all public APIs and data structures
26 top-level CLI/GUI applications
17 GUI plugins in the Qt plugin system

Comparison to Established Systems

System	Lines of Code	Domain	Status
Redis	~30,000	Database	Production
nginx	~100,000	Web server	Production
Django	~300,000	Web framework	Production
BMLibrarian	~211,000	Biomedical AI	Production-ready

BMLibrarian exceeds the scale of many mature, widely-deployed infrastructure software projects.

WHAT THIS SCALE REPRESENTS

Not a PhD Side Project — Infrastructure Software

Multi-layer architecture:

Core database layer: PostgreSQL integration with custom query optimization
Vector search layer: pgvector integration with HNSW indexing at 40M+ document scale
Agent orchestration layer: 15+ specialized AI agents with sophisticated coordination
Workflow management layer: Persistent task queuing, state management, error recovery
Multiple user interfaces: CLI, desktop GUI (PySide6/Qt), laboratory tools
Full-text discovery system: Multi-source PDF retrieval with browser automation
Semantic chunking system: Multiple chunking strategies with vector embeddings
LLM provider abstraction: Unified interface with token tracking across providers
Research quality assessment: PRISMA 2020, PICO extraction, study design evaluation, paper weight scoring
Fact-checking infrastructure: Statement validation, training data auditing, abstract fact-checking
Systematic review automation: Checkpoint-based reviews with Cochrane/GRADE assessment
Configuration management: Hierarchical config system with database-backed user settings
User authentication: Login system with per-user settings and session management
Database migrations: Automatic schema updates with version tracking
Comprehensive documentation: 272 markdown files covering user guides + developer docs

Development Methodology

Professional software engineering practices:

Type hints throughout (Python 3.12+)
Comprehensive unit testing (134 test files)
Modular architecture with clear separation of concerns
Configuration-driven design (no hardcoded parameters)
Extensive error handling and logging
Database transaction management and connection pooling
Async/parallel processing where appropriate
GUI/CLI separation for testability
Plugin architecture for extensibility (17 GUI plugins)

Fact Checker System

The BMLibrarian Fact Checker is a specialized tool for auditing biomedical statements in LLM training datasets, medical knowledge bases, and research claims. It evaluates statement veracity by searching literature databases and comparing claims against published evidence.

Core Capabilities

Automated Verification: Evaluates biomedical statements as yes/no/maybe based on literature evidence
Evidence Extraction: Provides specific citations with stance indicators (supports/contradicts/neutral)
Batch Processing: Process hundreds of statements from JSON input files
Confidence Assessment: Rates confidence (high/medium/low) based on evidence strength and consistency
Citation Validation: Prevents hallucination by validating all citations reference real database documents
Human Review Interface: Desktop GUI for annotation, comparison, and quality control

Key Features

CLI Tool (`fact_checker_cli.py`)

Batch fact-checking from JSON input files
Incremental processing - smart detection of previously evaluated statements
SQLite database storage for persistent results and annotations
Flexible thresholds for relevance scoring and citation extraction
Quick mode for faster testing with reduced document sets
Detailed output with evidence metadata and validation statistics

Review GUI (`fact_checker_review_gui.py`)

Interactive human review with statement-by-statement navigation
Blind mode - hide AI evaluations to prevent bias during human annotation
Incremental mode - filter to show only unannotated statements for efficient review
Database integration - automatic SQLite database creation from JSON files
Intelligent merging - import new statements without overwriting existing annotations
Citation inspection - expandable cards with full abstracts and highlighted passages
Multi-user support - track annotations by different reviewers
Export functionality - save human-annotated results for analysis

Use Cases

LLM Training Data Auditing: Verify factual accuracy of biomedical statements in training datasets
Medical Knowledge Validation: Check medical claims against current literature
Dataset Quality Control: Identify potentially incorrect statements in medical corpora
Evidence-Based Verification: Validate medical facts with specific literature references
Research Claim Verification: Evaluate research statements before publication

Database Workflow

The fact checker uses SQLite databases for persistent storage:

First run with JSON: Creates .db file alongside input JSON (e.g., results.json → results.db)
Subsequent runs: Intelligently merges new statements from JSON without overwriting existing evaluations/annotations
Real-time persistence: All AI evaluations and human annotations saved immediately to database
Incremental processing: Skip already-evaluated statements with --incremental flag
Cross-tool compatibility: CLI and GUI share the same database format

Example Workflow

# Step 1: Generate fact-check results from statements
uv run python fact_checker_cli.py statements.json -o results.json
# Creates: results.json (JSON output) and results.db (SQLite database)

# Step 2: Review with GUI in blind mode (no AI bias)
uv run python fact_checker_review_gui.py --input-file results.db --blind --user alice
# Human reviewer annotates statements without seeing AI evaluations

# Step 3: Review remaining statements in normal mode
uv run python fact_checker_review_gui.py --input-file results.db --incremental --user alice
# Shows only statements not yet annotated by alice

# Step 4: Export annotated results
# Use GUI "Save Reviews" button → results_annotated.json

# Step 5: Analyze results
uv run python analyze_factcheck_progress.py results_annotated.json

PaperChecker System

The BMLibrarian PaperChecker is a sophisticated fact-checking system for medical abstracts that validates research claims by systematically searching for and analyzing contradictory evidence.

Core Capabilities

Statement Extraction: Identifies core research claims (hypothesis, finding, conclusion) from abstracts
Counter-Evidence Search: Multi-strategy search (semantic + HyDE + keyword) for contradictory literature
Evidence-Based Verdicts: Three-class classification (supports/contradicts/undecided) with confidence levels
Complete Audit Trail: Full provenance tracking from search to final verdict
Batch Processing: CLI for processing multiple abstracts with database persistence

Key Features

CLI Tool (`paper_checker_cli.py`)

Batch fact-checking of medical abstracts from JSON or by PMID
Multi-strategy search combining semantic, HyDE, and keyword approaches
Counter-report generation synthesizing contradictory evidence
Markdown export for detailed reports per abstract
Database persistence in PostgreSQL papercheck schema

Laboratory GUI (`paper_checker_lab.py`)

Interactive testing with step-by-step workflow visualization
Real-time progress showing each processing stage
Results inspection for all intermediate outputs
Native desktop application (PySide6/Qt)

Workflow Overview

Abstract → Statement Extraction → Counter-Statement Generation →
Multi-Strategy Search → Document Scoring → Citation Extraction →
Counter-Report Generation → Verdict Analysis → JSON/Markdown Output

Example Usage

# Check abstracts from JSON file
uv run python paper_checker_cli.py abstracts.json -o results.json

# Export detailed markdown reports
uv run python paper_checker_cli.py abstracts.json --export-markdown reports/

# Check abstracts by PMID from database
uv run python paper_checker_cli.py --pmid 12345678 23456789

# Quick mode for testing
uv run python paper_checker_cli.py abstracts.json --quick

# Interactive laboratory
uv run python paper_checker_lab.py

Documentation

User Guide - Overview and quick start
CLI Guide - Command-line reference
Laboratory Guide - Interactive testing
Architecture - System design

Paper Weight Assessment

The Paper Weight Assessment system evaluates the evidential strength of biomedical research papers based on multiple dimensions, providing a comprehensive quality score that helps researchers and clinicians assess how much weight to give to study findings.

Assessment Dimensions

Dimension	Weight	What It Evaluates
Study Design	25%	Research methodology (RCT, cohort, case-control, etc.)
Sample Size	15%	Statistical power, confidence intervals, power calculations
Methodological Quality	30%	Randomization, blinding, protocol registration, ITT analysis
Risk of Bias	20%	Selection, performance, detection, and reporting biases
Replication Status	10%	Whether findings have been replicated by other studies

Example Usage

# Launch the Paper Weight Laboratory (GUI)
uv run python paper_weight_lab.py

# Features:
# - Search documents by PMID, DOI, or title
# - Real-time assessment progress tracking
# - Detailed audit trail for each dimension
# - Configurable dimension weights
# - Export to Markdown or JSON

Documentation

User Guide - Complete laboratory guide

PICO Extraction System

The PICO Agent extracts structured components from biomedical research papers using the PICO framework—essential for systematic reviews and evidence-based medicine.

What is PICO?

Population: Who was studied? (demographics, condition, setting)
Intervention: What was done? (treatment, test, exposure)
Comparison: What was the control? (placebo, alternative treatment)
Outcome: What was measured? (effects, results, endpoints)

Example Usage

from bmlibrarian.agents import PICOAgent

agent = PICOAgent(model="gpt-oss:20b")
extraction = agent.extract_pico_from_document(document)

print(f"Population: {extraction.population}")
print(f"Intervention: {extraction.intervention}")
print(f"Comparison: {extraction.comparison}")
print(f"Outcome: {extraction.outcome}")
print(f"Confidence: {extraction.extraction_confidence:.1%}")

# Interactive PICO Laboratory
uv run python pico_lab.py

# Batch process documents
# Export to CSV for systematic review tools (Covidence, DistillerSR)

Use Cases

Systematic Reviews: Rapidly extract PICO from hundreds of papers
Meta-Analysis: Standardize study data for quantitative synthesis
Research Gap Analysis: Identify understudied populations or outcomes
Grant Writing: Structure research questions using evidence-based frameworks

Documentation

User Guide - Complete PICO extraction guide
Developer Documentation - API reference

PRISMA 2020 Compliance Assessment

The PRISMA 2020 Agent assesses systematic reviews and meta-analyses against the PRISMA 2020 (Preferred Reporting Items for Systematic reviews and Meta-Analyses) 27-item checklist.

Assessment Process

Suitability Check: Automatically determines if the document is a systematic review or meta-analysis
27-Item Assessment: Evaluates all PRISMA checklist items with detailed explanations
Compliance Scoring: Provides overall compliance percentage and category

Scoring System

Score	Category	Interpretation
90-100%	Excellent	Outstanding adherence to PRISMA 2020
75-89%	Good	Strong reporting with minor gaps
60-74%	Adequate	Acceptable with room for improvement
40-59%	Poor	Significant reporting deficiencies
0-39%	Very Poor	Major reporting failures

Example Usage

# Launch the PRISMA 2020 Laboratory (GUI)
uv run python prisma2020_lab.py

# Features:
# - Automatic suitability screening
# - Color-coded compliance cards for each item
# - Export assessments to JSON or CSV
# - Batch processing multiple reviews

Use Cases

Self-assessment before submitting systematic reviews to journals
Peer review of systematic review manuscripts
Editorial screening for journal submissions
Training on PRISMA 2020 standards

Documentation

User Guide - Complete assessment guide
Developer Documentation - System architecture

Document Interrogation

The Document Interrogation interface provides an interactive chat experience for asking questions about loaded documents (PDFs, Markdown, or text files).

Features

Split-pane interface: Document viewer (60%) and chat interface (40%)
Multiple document formats: PDF, Markdown (.md), text (.txt)
Dialogue-style chat: User and AI messages in distinct bubbles
Full conversation history: Scrollable message history
Model selection: Choose any available Ollama model

Example Usage

# Launch the Configuration GUI (includes Document Interrogation tab)
uv run python bmlibrarian_config_gui.py

# Workflow:
# 1. Navigate to "Document Interrogation" tab
# 2. Load a document (PDF, MD, or TXT)
# 3. Select an Ollama model
# 4. Ask questions about the document

Example Questions

"What are the main findings of this study?"
"What methods did the authors use?"
"Are there any limitations mentioned?"
"Summarize the introduction section"

Documentation

User Guide - Complete usage guide

Full-Text PDF Discovery

The Full-Text Discovery system automatically finds and downloads PDF versions of academic papers through legal open access channels.

Discovery Sources (in priority order)

PubMed Central (PMC) - Verified open access repository
Unpaywall - Open access aggregator (millions of papers)
DOI Resolution - CrossRef and doi.org content negotiation
Direct URL - Existing PDF URLs from database
OpenAthens - Institutional proxy (if configured)

Example Usage

from bmlibrarian.discovery import FullTextFinder, DocumentIdentifiers

# Create finder with Unpaywall email
finder = FullTextFinder(unpaywall_email="your@email.com")

# Discover PDF sources
identifiers = DocumentIdentifiers(doi="10.1038/nature12373")
result = finder.discover(identifiers)

if result.best_source:
    print(f"Found: {result.best_source.url}")
    print(f"Access: {result.best_source.access_type.value}")

# Download PDFs for documents in database
uv run python -c "from bmlibrarian.discovery import download_pdf_for_document; ..."

Key Features

Multi-source discovery: Searches PMC, Unpaywall, CrossRef, DOI.org
Priority-based selection: Automatically selects best source (open access preferred)
Browser fallback: Handles Cloudflare and anti-bot protections via Playwright
Year-based organization: PDFs stored in YYYY/filename.pdf structure
Database integration: Automatically updates document records with PDF paths

Documentation

User Guide - Complete discovery guide
Developer Documentation - System architecture

Key Features

Multi-Agent AI System

QueryAgent: Natural language to PostgreSQL query conversion
SemanticQueryAgent: Vector-based semantic search with embeddings
DocumentScoringAgent: Relevance scoring for research questions (1-5 scale)
CitationFinderAgent: Extracts relevant passages from high-scoring documents
ReportingAgent: Synthesizes citations into medical publication-style reports
CounterfactualAgent: Analyzes documents to generate research questions for finding contradictory evidence
EditorAgent: Creates balanced comprehensive reports integrating all evidence
FactCheckerAgent: Evaluates biomedical statements (yes/no/maybe) with literature evidence
PaperCheckerAgent: Validates medical abstract claims against contradictory literature evidence
PaperReviewerAgent: Comprehensive paper assessment combining all analysis agents in an 11-step workflow
PICOAgent: Extracts Population, Intervention, Comparison, and Outcome components
PRISMA2020Agent: Assesses systematic reviews against the 27-item PRISMA 2020 checklist
StudyAssessmentAgent: Evaluates research quality, study design, and bias risk
PaperWeightAgent: Evidential weight scoring across five quality dimensions
TransparencyAgent: Undisclosed bias risk detection via funding, COI, data availability, and trial registration analysis
DocumentInterrogationAgent: Interactive Q&A with loaded documents (PDF, Markdown, text)
SystematicReviewAgent: Automated systematic literature review with Cochrane/GRADE assessment

Advanced Workflow Orchestration

Enum-Based Workflow: Flexible step orchestration with meaningful names
Iterative Processing: Query refinement, threshold adjustment, citation requests
Task Queue System: SQLite-based persistent task queuing for memory-efficient processing
Human-in-the-Loop: Interactive decision points with auto-mode support
Branching Logic: Conditional step execution and error recovery

Production-Ready Infrastructure

Database Migration System: Automated schema initialization and incremental updates with startup integration
PostgreSQL + pgvector: Semantic search with vector embeddings at 40M+ document scale
Semantic Chunking: Multiple strategies (adaptive, sentence-based, SpaCy NLP) with vector embeddings
PostgreSQL Audit Trail: Comprehensive tracking of research workflow sessions
User Authentication: Login system with per-user database-backed settings
LLM Provider Abstraction: Unified interface with token tracking across providers
Local LLM Integration: Ollama service for privacy-preserving AI inference
134 Test Files: Comprehensive test coverage across all modules
16 GUI Plugins: Modular PySide6/Qt plugin architecture
Browser-Based Downloader: Playwright automation for Cloudflare-protected PDFs (optional)

Advanced Analytics

Multi-Model Query Generation: Use multiple AI models (up to 3) to generate diverse database queries for 20-40% improved document retrieval
Query Performance Tracking: Real-time analysis of which models and parameters find the most relevant documents
Counterfactual Analysis: Systematic search for contradictory evidence with progressive audit trail
Evidence Strength Assessment: Quality evaluation with citation validation and rejection reasoning
Temporal Precision: Specific year references instead of vague temporal terms
Document Verification: Real database ID validation to prevent hallucination
Citation Validation: AI-powered verification that citations actually support counterfactual claims
User Override Capability: Expert users can override AI rejection decisions with custom reasoning
Research Workflow Audit Trail: PostgreSQL-based persistent tracking of complete research sessions

Quick Start

Installation

# Clone the repository
git clone https://github.com/hherb/bmlibrarian.git
cd bmlibrarian

# Install dependencies using uv (recommended)
uv sync

Prerequisites

Python: 3.12+ (required for modern type hints and performance)
Database: PostgreSQL 12+ with pgvector extension
AI/LLM: Ollama server for local language model inference
Extensions: pgvector, pg_trgm for semantic search capabilities

Environment Setup

Configure database and AI settings:

# Create .env file in project directory
cat > .env << EOF
# Database Configuration
POSTGRES_USER=your_username
POSTGRES_PASSWORD=your_password
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=knowledgebase

# File System
PDF_BASE_DIR=~/knowledgebase/pdf

# AI/LLM Configuration (Ollama typically runs on localhost:11434)
OLLAMA_BASE_URL=http://localhost:11434
EOF

Start required services:

# Start Ollama service (for AI inference)
ollama serve

# Ensure PostgreSQL is running with pgvector extension
psql -c "CREATE EXTENSION IF NOT EXISTS vector;"

Usage Examples

Interactive Research CLI

# Start the comprehensive medical research CLI
uv run python bmlibrarian_cli.py

# Quick testing mode
uv run python bmlibrarian_cli.py --quick

# Automated mode with research question
uv run python bmlibrarian_cli.py --auto "What are the cardiovascular benefits of exercise?"

Desktop Research Application

# Launch the GUI research application
uv run python bmlibrarian_research_gui.py

# Features:
# - Visual workflow progress with collapsible step cards
# - Real-time agent execution with model configuration
# - Multi-model query generation with smart pagination and result tracking
# - Query performance statistics showing model effectiveness
# - Progressive counterfactual audit trail showing claims, queries, searches, and results
# - Formatted markdown report preview with scrolling
# - Direct file save functionality
# - Complete transparency into citation validation and rejection reasoning
# - Automatic audit trail persistence to PostgreSQL database

Configuration GUI

# Launch the configuration interface
uv run python bmlibrarian_config_gui.py

# Configure agents, models, and parameters through GUI:
# - Model selection with live refresh from Ollama
# - Parameter tuning with interactive sliders
# - Multi-model query generation configuration tab
# - Connection testing and validation
# - Visual value displays for all configuration parameters

Fact Checker CLI for LLM Training Data Auditing

# Check biomedical statements against literature evidence
uv run python fact_checker_cli.py input.json -o results.json

# Input format (input.json):
# [
#   {"statement": "All cases of childhood UC require colectomy", "answer": "no"},
#   {"statement": "Vitamin D deficiency is common in IBD", "answer": "yes"}
# ]

# This creates TWO outputs:
#   - results.json: JSON file with fact-check results
#   - results.db: SQLite database for persistent storage

# Incremental mode - skip already-evaluated statements
uv run python fact_checker_cli.py input.json -o results.json --incremental
# Only processes new statements, preserves existing evaluations

# Quick mode for faster testing
uv run python fact_checker_cli.py input.json -o results.json --quick

# Custom thresholds for precision control
uv run python fact_checker_cli.py input.json -o results.json \
  --score-threshold 3.0 --max-search-results 100 --max-citations 15

# Verbose mode with detailed output
uv run python fact_checker_cli.py input.json -o results.json -v --detailed

# Custom model selection
uv run python fact_checker_cli.py input.json -o results.json \
  --model medgemma-27b-text-it-Q8_0:latest --temperature 0.15

# Run demonstration
uv run python examples/fact_checker_demo.py

Fact-Checker Review GUI

# Human review and annotation of fact-checking results
uv run python fact_checker_review_gui.py

# Load JSON file (auto-creates SQLite database for annotations)
uv run python fact_checker_review_gui.py --input-file results.json

# Load existing database directly
uv run python fact_checker_review_gui.py --input-file results.db

# BLIND MODE - hide AI evaluations to prevent annotation bias
uv run python fact_checker_review_gui.py --input-file results.db --blind --user alice
# Perfect for unbiased human annotation without AI influence

# INCREMENTAL MODE - show only unannotated statements
uv run python fact_checker_review_gui.py --input-file results.db --incremental --user alice
# Efficiently review only statements you haven't annotated yet

# Multi-user workflow with user tracking
uv run python fact_checker_review_gui.py --input-file results.db --user bob
# Track annotations by different reviewers

# Features:
# - Automatic SQLite database creation from JSON files
# - Intelligent merging: import new statements without overwriting existing annotations
# - Real-time persistence: all annotations saved immediately to database
# - Statement-by-statement review with progress tracking
# - Compare original, AI, and human annotations side-by-side
# - Expandable citation cards with full abstracts and highlighted passages
# - Color-coded stance indicators (supports/contradicts/neutral)
# - Blind mode for unbiased annotation (hide AI evaluations)
# - Incremental mode for efficient review (filter unannotated statements)
# - Multi-user support with annotator metadata
# - Export reviewed annotations to JSON file
# - Perfect for quality control and training data validation

Browser-Based PDF Download (Optional)

For PDFs protected by Cloudflare or anti-bot measures:

# Install browser automation support (optional)
uv add --optional browser
uv run python -m playwright install chromium

# Download PDFs using browser automation
uv run python download_pdfs_with_browser.py --batch-size 20

# Run with visible browser (for debugging)
uv run python download_pdfs_with_browser.py --visible

# Test the browser downloader
uv run python test_browser_download.py

See BROWSER_DOWNLOADER.md for detailed documentation on:

Cloudflare bypass techniques
CAPTCHA handling
Stealth mode configuration
Performance optimization

Multi-Agent Workflow (Programmatic)

from bmlibrarian.agents import (
    QueryAgent, DocumentScoringAgent, CitationFinderAgent, 
    ReportingAgent, CounterfactualAgent, EditorAgent, 
    AgentOrchestrator
)
from bmlibrarian.cli.workflow_steps import (
    create_default_research_workflow, WorkflowExecutor
)

# Initialize orchestration system
orchestrator = AgentOrchestrator(max_workers=4)
workflow = create_default_research_workflow()
executor = WorkflowExecutor(workflow)

# Initialize specialized agents
query_agent = QueryAgent(orchestrator=orchestrator)
scoring_agent = DocumentScoringAgent(orchestrator=orchestrator)
citation_agent = CitationFinderAgent(orchestrator=orchestrator)
reporting_agent = ReportingAgent(orchestrator=orchestrator)
counterfactual_agent = CounterfactualAgent(orchestrator=orchestrator)
editor_agent = EditorAgent(orchestrator=orchestrator)

# Execute research workflow
research_question = "What are the cardiovascular benefits of exercise?"
executor.add_context('research_question', research_question)

# The workflow handles: query generation, document search, scoring,
# citation extraction, report generation, and counterfactual analysis
final_report = executor.get_context('comprehensive_report')

Architecture Overview

Multi-Agent System Architecture

BMLibrarian employs a sophisticated multi-agent architecture where specialized AI agents collaborate to process biomedical literature:

graph TD
    A[Research Question] --> B[QueryAgent]
    B --> C[Database Search]
    C --> D[DocumentScoringAgent]
    D --> E[CitationFinderAgent]
    E --> F[ReportingAgent]
    F --> G{Counterfactual Analysis?}
    G -->|Yes| H[CounterfactualAgent]
    G -->|No| I[EditorAgent]
    H --> J[Contradictory Evidence Search]
    J --> I
    I --> K[Comprehensive Report]

Workflow Orchestration System

The enum-based workflow system provides flexible step orchestration:

WorkflowStep Enum: Meaningful step names instead of brittle numbering
Repeatable Steps: Query refinement, threshold adjustment, citation requests
Branching Logic: Conditional step execution and error recovery
Context Management: State preservation across step executions
Auto Mode Support: Graceful handling of non-interactive execution

Task Queue System

QueueManager: SQLite-based persistent task queuing
AgentOrchestrator: Coordinates multi-agent workflows
Task Priorities: HIGH, NORMAL, LOW priority levels
Batch Processing: Memory-efficient handling of large document sets

Application Suite

Command Line Interface (CLI)

The interactive medical research CLI (bmlibrarian_cli.py) provides:

Full 12-step research workflow with enum-based orchestration
Human-in-the-loop decision points with auto-mode support
Query refinement and threshold adjustment capabilities
Counterfactual analysis for comprehensive evidence evaluation
Enhanced markdown export with proper citation formatting

Fact-Checker CLI

The fact-checking command-line tool (fact_checker_cli.py) provides:

Batch processing of biomedical statements from JSON files
Literature validation with AI-powered yes/no/maybe evaluations
SQLite database storage for persistent results and incremental processing
Evidence extraction with citation stance indicators and confidence assessment
Incremental mode - skip already-evaluated statements for efficient processing
Flexible thresholds - control relevance scoring and citation extraction
Validation support - compare AI evaluations against expected answers
Detailed output - comprehensive metadata, statistics, and evidence lists

Fact-Checker Review GUI

The human review desktop application (fact_checker_review_gui.py) provides:

Interactive review interface with statement-by-statement navigation
Blind mode - hide AI evaluations to prevent annotation bias for unbiased human judgments
Incremental mode - filter to show only unannotated statements for efficient review
Database integration - automatic SQLite database creation and intelligent JSON import/merge
Citation inspection - expandable cards with full abstracts and highlighted passages
Multi-user support - track annotations by different reviewers with metadata
Comparison view - see original annotations, AI evaluations, and human annotations side-by-side
Real-time persistence - all annotations saved immediately to database
Export functionality - save human-annotated results to JSON for analysis
Quality control - perfect for training data validation and model evaluation

Desktop Research Application

The GUI research application (bmlibrarian_research_gui.py) offers:

Native cross-platform desktop interface built with PySide6/Qt
Visual workflow progress with collapsible step cards
Multi-model query generation with smart pagination and result tracking
Progressive counterfactual audit trail with real-time updates
PostgreSQL audit trail for persistent session tracking
Real-time agent execution with configured AI models
Formatted markdown report preview with scrollable display
Direct file save functionality
Complete transparency into citation validation and rejection reasoning

Configuration Interface

The configuration GUI (bmlibrarian_config_gui.py) provides:

Tabbed interface for agent-specific configuration
Model selection with live refresh from Ollama server
Parameter adjustment with interactive sliders and visual value displays
Multi-model query generation configuration tab for setting up multiple models
Connection testing and validation tools
Support for configuring query diversity, pagination, and performance tracking

Laboratory Tools

Paper Reviewer Lab (paper_reviewer_lab.py): Comprehensive paper assessment with 11-step unified workflow (PySide6/Qt)
Paper Checker Lab (paper_checker_lab.py): Interactive medical abstract fact-checking with step-by-step visualization
Paper Weight Lab (paper_weight_lab.py): Evidential weight assessment across five quality dimensions (PySide6/Qt)
PubMed Search Lab (pubmed_search_lab.py): Search PubMed API directly without local database (PySide6/Qt)
QueryAgent Lab (query_lab.py): Experimental interface for natural language to SQL conversion
PICO Lab (pico_lab.py): Interactive PICO component extraction from research papers
PRISMA 2020 Lab (prisma2020_lab.py): Systematic review compliance assessment against 27-item checklist
Transparency Lab (transparency_lab.py): Undisclosed bias risk assessment for research papers
Study Assessment Lab (study_assessment_lab.py): Research quality and trustworthiness evaluation
Citation Lab (citation_lab.py): Citation extraction experimentation
Agent Demonstrations: Examples showcasing multi-agent capabilities in examples/ directory

Configuration System

Configuration File Locations

BMLibrarian uses a hierarchical configuration system:

Primary: ~/.bmlibrarian/config.json (recommended, OS agnostic)
Legacy fallback: bmlibrarian_config.json in current directory
GUI default: Always saves to ~/.bmlibrarian/config.json

Agent Configuration

Each agent can be individually configured with:

Model Selection: Choose from available Ollama models
Temperature: Control creativity/randomness (0.0-1.0)
Top-P: Control nucleus sampling (0.0-1.0)
Agent-Specific Settings: Citation count limits, scoring thresholds, etc.

Multi-Model Query Generation Configuration

Configure query diversity for improved document retrieval:

Multi-Model Enabled: Toggle feature on/off (default: disabled)
Models: Select up to 3 different AI models for query generation
Queries Per Model: Generate 1-3 diverse queries per model
Execution Mode: Serial execution optimized for local instances
De-duplication: Automatic query and document de-duplication
User Control: Option to review and select generated queries before execution

Example configuration:

{
  "query_generation": {
    "multi_model_enabled": true,
    "models": ["medgemma-27b-text-it-Q8_0:latest", "gpt-oss:20b", "medgemma4B_it_q8:latest"],
    "queries_per_model": 1,
    "execution_mode": "serial",
    "deduplicate_results": true,
    "show_all_queries_to_user": true,
    "allow_query_selection": true
  }
}

Environment Variables

# Database Configuration
POSTGRES_USER=your_username
POSTGRES_PASSWORD=your_password  
POSTGRES_HOST=localhost          # Default: localhost
POSTGRES_PORT=5432              # Default: 5432
POSTGRES_DB=knowledgebase       # Default: knowledgebase

# File System
PDF_BASE_DIR=~/knowledgebase/pdf # Base directory for PDF files

# AI/LLM Configuration  
OLLAMA_BASE_URL=http://localhost:11434  # Ollama server URL

Using .env Files

Create a .env file in your project directory:

# Database settings
POSTGRES_USER=bmlib_user
POSTGRES_PASSWORD=secure_password
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=knowledgebase

# AI settings
OLLAMA_BASE_URL=http://localhost:11434
PDF_BASE_DIR=~/knowledgebase/pdf

Default AI Models

Complex Tasks: gpt-oss:20b (comprehensive analysis, report generation)
Fast Processing: medgemma4B_it_q8:latest (quick scoring, classification)
Multi-Model Query Generation: Combine multiple models for query diversity:
- medgemma-27b-text-it-Q8_0:latest (medical domain specialist)
- gpt-oss:20b (general purpose with strong reasoning)
- medgemma4B_it_q8:latest (fast queries with medical focus)

Documentation

Comprehensive documentation is available in the doc/ directory:

User Guides

Getting Started - Quick start guide and installation
Configuration Guide - System configuration and settings
CLI Guide - Command-line interface usage
Research GUI Guide - Desktop research application
Config GUI Guide - Configuration interface
Paper Reviewer Lab Guide - Comprehensive paper assessment
Fact Checker Guide - LLM training data auditing and statement verification
Fact Checker Review Guide - Human annotation and review GUI
Paper Checker Guide - Medical abstract fact-checking
Paper Weight Lab Guide - Evidential weight assessment
PICO Agent Guide - PICO component extraction for systematic reviews
PRISMA 2020 Guide - Systematic review compliance assessment
Study Assessment Guide - Research quality evaluation
Transparency Assessment Guide - Undisclosed bias risk detection
Document Interrogation Guide - Interactive document Q&A
Full-Text Discovery Guide - PDF discovery and download
PDF Export Guide - Markdown to PDF export
Query Agent Guide - Natural language query processing
Multi-Model Query Guide - Multi-model query generation
Citation Guide - Citation extraction and formatting
Reporting Guide - Report generation and export
Counterfactual Guide - Contradictory evidence analysis
Systematic Review Guide - Systematic literature review workflow
Audit Validation Guide - Human validation of audit trail items
Writing Plugin Guide - Citation-aware markdown editor
Settings Migration Guide - Database-backed settings migration
OpenAthens Guide - Institutional proxy authentication
MedRxiv Import Guide - MedRxiv preprint import
Document Embedding Guide - Document embedding generation
Workflow Guide - Workflow orchestration system
Troubleshooting - Common issues and solutions

Developer Documentation

Agent Module - Multi-agent system architecture
Citation System - Citation processing internals
Reporting System - Report generation system
Counterfactual System - Evidence analysis framework
Fact Checker System - Fact-checking architecture and internals
Paper Checker Architecture - PaperChecker system design
PICO Agent - PICO extraction system internals
PRISMA 2020 System - PRISMA compliance assessment system
Study Assessment System - Research quality evaluation system
Transparency Assessment System - Bias risk detection architecture
Full-Text Discovery System - PDF discovery architecture
Document Card Factory - GUI document card system
Multi-Model Architecture - Multi-model query generation
Audit Validation System - Human validation architecture
Writing System - Citation-aware editor internals

Development

Development Environment Setup

Clone the repository:

git clone https://github.com/hherb/bmlibrarian.git
cd bmlibrarian

Install dependencies using uv (recommended):

uv sync

Set up environment:

# Copy example environment file
cp .env.example .env
# Edit .env with your database and Ollama settings

Start required services:

# Start Ollama service for AI inference
ollama serve

# Ensure PostgreSQL is running with pgvector
psql -c "CREATE EXTENSION IF NOT EXISTS vector;"

Database migrations run automatically:

# No manual migration required! The system automatically:
# - Detects your database schema version
# - Applies any pending migrations on first startup
# - Creates audit trail tables for research tracking
# - Tracks migration history for safe upgrades

Testing

BMLibrarian includes comprehensive testing for all agents and workflow components:

# Run all tests with coverage
uv run python -m pytest tests/ --cov=src/bmlibrarian

# Test specific components
uv run python -m pytest tests/test_query_agent.py
uv run python -m pytest tests/test_scoring_agent.py
uv run python -m pytest tests/test_citation_agent.py
uv run python -m pytest tests/test_reporting_agent.py
uv run python -m pytest tests/test_counterfactual_agent.py

# Run integration tests (requires database)
uv run python -m pytest tests/ -m integration

# Test CLI and GUI applications
uv run python bmlibrarian_cli.py --quick
uv run python bmlibrarian_research_gui.py --auto "test question" --quick
uv run python bmlibrarian_config_gui.py --debug

Test suite: 145 test files across all modules

Development Commands

# Run agent demonstrations
uv run python examples/agent_demo.py
uv run python examples/citation_demo.py  
uv run python examples/reporting_demo.py
uv run python examples/counterfactual_demo.py

# Launch laboratory tools
uv run python query_lab.py  # QueryAgent experimental interface

# Run applications in development mode
uv run python bmlibrarian_cli.py --debug
uv run python bmlibrarian_research_gui.py --debug
uv run python bmlibrarian_config_gui.py --debug

Development Principles

Modern Python Standards: Uses Python ≥3.12 with type hints and pyproject.toml
Enum-Based Architecture: Flexible workflow orchestration with meaningful step names
Comprehensive Testing: Unit tests for all agents with realistic test data
Documentation First: Both user guides and developer documentation for all features
AI-Powered: Local LLM integration via Ollama for privacy-preserving processing
Scalable Architecture: Queue-based processing for memory-efficient large-scale operations
Database-First Design: PostgreSQL audit trail for complete research workflow tracking
Performance Monitoring: Built-in query performance tracking and optimization insights
Zero-Configuration Migrations: Automatic database schema updates on startup

Code Quality Standards

BaseAgent Pattern: All agents inherit from BaseAgent with standardized interfaces
Configuration Integration: Agents use get_model() and get_agent_config() from config system
Document ID Integrity: Always use real database IDs, never mock/fabricated references
Workflow Integration: Agents support enum-based workflow system execution
No Artificial Limits: Process ALL documents unless explicitly configured otherwise

Security & Best Practices

Credentials: Never hardcode passwords; use environment variables and .env files
Local AI Processing: Uses local Ollama service to keep research data private
Database Safety: Never modify production database "knowledgebase" without permission
Data Integrity: All document IDs are programmatically verified to prevent hallucination
Input Validation: All user inputs and LLM outputs are validated and sanitized
Error Handling: Robust error recovery and logging throughout the system

Contributing

We welcome contributions to BMLibrarian! Areas for contribution include:

Agent Development

New specialized agents for literature analysis tasks
Enhanced natural language processing capabilities
Improved evidence synthesis and reporting algorithms

Workflow Enhancement

Additional workflow steps for specialized research domains
Enhanced iterative capabilities and human-in-the-loop features
Integration with external biomedical databases and APIs

User Experience

GUI improvements and new interface features
Enhanced visualization of research workflow progress
Mobile and web-based interface development

Documentation & Testing

Expanded user guides and tutorials
Additional agent demonstrations and examples
Performance testing and optimization

Project Status & Maturity

BMLibrarian is a production-ready system with:

15+ Specialized AI Agents: Full multi-agent architecture with sophisticated coordination
Systematic Review Automation: Checkpoint-based reviews with Cochrane/GRADE assessment
Comprehensive Workflow System: 12-step research process with iterative capabilities
Robust Infrastructure: Queue orchestration, error handling, semantic chunking, and progress tracking
26 CLI/GUI Applications: Research, configuration, fact-checking, systematic review, import tools
16 GUI Plugins: Modular PySide6/Qt plugin architecture
134 Test Files: Comprehensive test coverage across all modules
272 Documentation Files: User guides and developer documentation for every component
Privacy-First: All AI processing runs locally via Ollama

License

[License information to be added]

Support & Community

Documentation: Comprehensive guides available in the doc/ directory
Issues: Report bugs and feature requests via GitHub issues
Discussions: Join our community discussions for questions and collaboration
Examples: Review demonstration scripts in the examples/ directory

Acknowledgments

BMLibrarian builds upon the power of:

PostgreSQL + pgvector: High-performance semantic search capabilities
Ollama: Local, privacy-preserving language model inference
PySide6/Qt: Cross-platform native desktop GUI framework
ReportLab: Professional PDF generation (BSD license)
Playwright: Browser automation for PDF discovery and OpenAthens authentication
Python Ecosystem: Modern Python >=3.12 with comprehensive typing support

BMLibrarian: The Biomedical Researcher's AI Workbench—evidence-based answers, peer-review quality assessment, and systematic fact-checking, all running locally on your hardware.

Name		Name	Last commit message	Last commit date
Latest commit History 1,112 Commits
.claude		.claude
.github/workflows		.github/workflows
assets		assets
benchmarks/semantic_search		benchmarks/semantic_search
config_examples		config_examples
doc		doc
examples		examples
migrations		migrations
migrations_rollbacks		migrations_rollbacks
scripts		scripts
src/bmlibrarian		src/bmlibrarian
tests		tests
wiki		wiki
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
SETUP_GUIDE.md		SETUP_GUIDE.md
audit_validation_gui.py		audit_validation_gui.py
baseline_schema.sql		baseline_schema.sql
bmlibrarian_cli.py		bmlibrarian_cli.py
bmlibrarian_qt.py		bmlibrarian_qt.py
bmlibrarian_research_gui.py		bmlibrarian_research_gui.py
clinicaltrials_import_cli.py		clinicaltrials_import_cli.py
europe_pmc_bulk_cli.py		europe_pmc_bulk_cli.py
europe_pmc_pdf_cli.py		europe_pmc_pdf_cli.py
export_to_pdf.py		export_to_pdf.py
fact_checker_cli.py		fact_checker_cli.py
fact_checker_review_gui.py		fact_checker_review_gui.py
fact_checker_stats.py		fact_checker_stats.py
initial_setup_and_download.py		initial_setup_and_download.py
medrxiv_import_cli.py		medrxiv_import_cli.py
medrxiv_meca_cli.py		medrxiv_meca_cli.py
mesh_import_cli.py		mesh_import_cli.py
migrate_config_to_db.py		migrate_config_to_db.py
model_benchmark_cli.py		model_benchmark_cli.py
paper_checker_cli.py		paper_checker_cli.py
pdf_import_cli.py		pdf_import_cli.py
pmc_bulk_cli.py		pmc_bulk_cli.py
pubmed_bulk_cli.py		pubmed_bulk_cli.py
pubmed_import_cli.py		pubmed_import_cli.py
pubmed_repair_cli.py		pubmed_repair_cli.py
pyproject.toml		pyproject.toml
retraction_watch_cli.py		retraction_watch_cli.py
setup_wizard.py		setup_wizard.py
systematic_review_cli.py		systematic_review_cli.py
systematic_review_gui.py		systematic_review_gui.py
test_database.env.example		test_database.env.example
thesaurus_import_cli.py		thesaurus_import_cli.py
transparency_analyzer_cli.py		transparency_analyzer_cli.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

BMLibrarian

Why BMLibrarian?

Evidence-Based Answers to Clinical Questions

Automated Research Quality Assessment

Robust Fact-Checking

Works Offline—Critical for Global Health

Multiple Search Strategies with AI Assistance

Privacy-Preserving AI

What's New

Paper Reviewer Lab

Systematic Literature Review Agent

Europe PMC Full-Text and PDF Import

PubMed Search Lab

Audit Trail Validation GUI

Citation-Aware Writing Editor

Other Recent Features

Overview

ARCHITECTURAL SCALE

Codebase Statistics

Comparison to Established Systems

WHAT THIS SCALE REPRESENTS

Not a PhD Side Project — Infrastructure Software

Development Methodology

Fact Checker System

Core Capabilities

Key Features

CLI Tool (fact_checker_cli.py)

Review GUI (fact_checker_review_gui.py)

Use Cases

Database Workflow

Example Workflow

PaperChecker System

Core Capabilities

Key Features

CLI Tool (paper_checker_cli.py)

Laboratory GUI (paper_checker_lab.py)

Workflow Overview

Example Usage

Documentation

Paper Weight Assessment

Assessment Dimensions

Example Usage

Documentation

PICO Extraction System

What is PICO?

Example Usage

Use Cases

Documentation

PRISMA 2020 Compliance Assessment

Assessment Process

Scoring System

Example Usage

Use Cases

Documentation

Document Interrogation

Features

Example Usage

Example Questions

Documentation

Full-Text PDF Discovery

Discovery Sources (in priority order)

Example Usage

Key Features

Documentation

Key Features

Multi-Agent AI System

Advanced Workflow Orchestration

Production-Ready Infrastructure

Advanced Analytics

Quick Start

Installation

Prerequisites

Environment Setup

Usage Examples

Interactive Research CLI

Desktop Research Application

CLI Tool (`fact_checker_cli.py`)

Review GUI (`fact_checker_review_gui.py`)

CLI Tool (`paper_checker_cli.py`)

Laboratory GUI (`paper_checker_lab.py`)

Packages