REST API for full-text indexing and search of PDF documents and web pages using SQLite FTS5.
- PDF and URL indexing
- Full-text search with SQLite FTS5
- Exact term position highlighting
- Context snippet extraction
- Complete document management (CRUD)
- Automatic URL duplicate prevention
dsa --host <host> --port <port>By default:
- host =
localhost - port =
8000
Access interactive documentation at: http://<host>:<port>/docs
POST /documents/upload
Content-Type: multipart/form-data
file: document.pdfResponse:
{
"id": "uuid",
"message": "Document indexed successfully"
}POST /documents/from-url?url=https://example.comResponse:
{
"id": "uuid",
"message": "URL indexed successfully",
"action": "created"
}If the URL already exists, content is updated keeping the same ID (action: "updated").
GET /search?query=pythonReturns list of documents with ID, title, and type.
GET /search?query=python&include_matches=trueAdds exact positions for each occurrence:
{
"id": "uuid",
"title": "document.pdf",
"type": "application/pdf",
"match_count": 5,
"matches": [
{
"term": "python",
"start": 245,
"end": 251,
"matched_text": "Python"
}
]
}GET /search?query=python&include_matches=true&include_snippets=trueAdds context snippets around each match.
Optional parameters:
max_matches: Match limit per term (default: 200, max: 1000)
GET /documentsReturns all documents id, title and type.
GET /documents/{id}Returns complete document including content.
DELETE /documents/{id}Deletes a document by ID.
id: Unique UUID
title: File name or URL
content: Extracted text
type: application/pdf or web
SQLite Full-Text Search 5 for optimized search on id, title, content, and type.
- SQLite FTS5 identifies relevant documents using inverted index
- Regex finds exact term positions in content
- Snippet extraction retrieves context around matches
- Case-insensitive
- Whole word search (no partial matches)
- Multiple terms separated by spaces
- Match ordering by position in document
MIT License.