Document Search API

REST API for full-text indexing and search of PDF documents and web pages using SQLite FTS5.

Features

PDF and URL indexing
Full-text search with SQLite FTS5
Exact term position highlighting
Context snippet extraction
Complete document management (CRUD)
Automatic URL duplicate prevention

Run

dsa --host <host> --port <port>

By default:

host = localhost
port = 8000

Access interactive documentation at: http://<host>:<port>/docs

Endpoints

Indexing

Upload PDF

POST /documents/upload
Content-Type: multipart/form-data

file: document.pdf

Response:

{
  "id": "uuid",
  "message": "Document indexed successfully"
}

Index URL

POST /documents/from-url?url=https://example.com

Response:

{
  "id": "uuid",
  "message": "URL indexed successfully",
  "action": "created"
}

If the URL already exists, content is updated keeping the same ID (action: "updated").

Search

Basic Search

GET /search?query=python

Returns list of documents with ID, title, and type.

Search with Positions

GET /search?query=python&include_matches=true

Adds exact positions for each occurrence:

{
  "id": "uuid",
  "title": "document.pdf",
  "type": "application/pdf",
  "match_count": 5,
  "matches": [
    {
      "term": "python",
      "start": 245,
      "end": 251,
      "matched_text": "Python"
    }
  ]
}

Complete Search

GET /search?query=python&include_matches=true&include_snippets=true

Adds context snippets around each match.

Optional parameters:

max_matches: Match limit per term (default: 200, max: 1000)

List Documents

GET /documents

Returns all documents id, title and type.

Get by ID

GET /documents/{id}

Returns complete document including content.

Delete

DELETE /documents/{id}

Deletes a document by ID.

Data Structure

Document in Database

id: Unique UUID
title: File name or URL
content: Extracted text
type: application/pdf or web

FTS5 Index

SQLite Full-Text Search 5 for optimized search on id, title, content, and type.

Text Search

How It Works

SQLite FTS5 identifies relevant documents using inverted index
Regex finds exact term positions in content
Snippet extraction retrieves context around matches

Search Features

Case-insensitive
Whole word search (no partial matches)
Multiple terms separated by spaces
Match ordering by position in document

License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
api		api
db		db
services		services
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Search API

Features

Run

Endpoints

Indexing

Upload PDF

Index URL

Search

Basic Search

Search with Positions

Complete Search

List Documents

Get by ID

Delete

Data Structure

Document in Database

FTS5 Index

Text Search

How It Works

Search Features

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Document Search API

Features

Run

Endpoints

Indexing

Upload PDF

Index URL

Search

Basic Search

Search with Positions

Complete Search

List Documents

Get by ID

Delete

Data Structure

Document in Database

FTS5 Index

Text Search

How It Works

Search Features

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages