Skip to content

Evaluate options for background task processing #18

@quevon24

Description

@quevon24

Context

We need background processing for three use cases:

  1. PDF compression - Run Ghostscript to compress uploaded scan PDFs
  2. Blackletter OCR pipeline - Run the blackletter ML pipeline to add OCR and metadata to scans
  3. Blackletter split pipeline - Run the blackletter ML pipeline to split a full book scan into individual opinions

Currently the app only supports uploading files. None of these processing features exist yet. We need to choose a background processing approach before implementing them, since these tasks (PDF compression, ML inference) are too slow to run in the request/response cycle.

Constraints:

  • ~10 concurrent users max
  • Upload rate limited by physical scanning speed (vflat)
  • Low volume workload, not a high-throughput pipeline
  • Already have PostgreSQL, want to avoid adding Redis if possible
  • Running on Docker Compose (dev) / K8s (prod)

Option 1: django.tasks + django-tasks DatabaseBackend

Django 6.0 ships a native django.tasks framework (DEP 14) with a standard API (@task decorator, .enqueue(), TaskResult). The django-tasks package provides a DatabaseBackend that stores tasks in PostgreSQL using SELECT ... FOR UPDATE SKIP LOCKED for safe concurrent pickup.

Infrastructure: +1 Docker service (worker running manage.py db_worker, reuses the same Django image)

Pros:

  • First-party Django API. Task definitions use django.tasks from Django core. Backends are swappable without changing task code
  • No Redis needed, uses existing PostgreSQL
  • Only one extra container for the worker (reuses the same image, just a different entrypoint command)
  • Clean, idiomatic code: @task decorator + .enqueue()
  • Transaction-safe enqueueing via transaction.on_commit()
  • Async support: aenqueue(), aget_result()
  • Future-proof: this is the direction Django is heading

Cons:

  • No built-in automatic retry. If the worker crashes mid-task, the task stays in RUNNING state forever
  • No built-in scheduling or delayed execution
  • Youngest ecosystem (~1 year), fewer community resources
  • DatabaseBackend polls for new tasks (slight latency vs push-based brokers)

Stale task recovery (manual retry):
Since django.tasks has no automatic retry, we can implement a recovery pattern:

  • Add a processing_started_at timestamp to the Scan model
  • Write a management command that resets scans stuck in COMPRESSING/SCANNING status for longer than X minutes
  • Run it via a K8s CronJob every 15 minutes
  • This effectively gives us retry behavior without needing it built into the task framework

Option 2: subprocess.Popen + K8s CronJob

Spawn a detached OS process from the view that runs a Django management command. The child process is fully independent of the web worker (survives Gunicorn worker recycling).

Infrastructure: No new services. Management commands run as child processes of the web worker, or via K8s CronJobs for batch/scheduled work.

Pros:

  • Zero dependencies, zero configuration
  • Survives web worker restarts (start_new_session=True fully detaches the child process)
  • True parallelism (separate process, own GIL)
  • Management commands are testable and can be run manually from the shell
  • Good stepping stone before committing to a task queue

Cons:

  • 1-3 second startup overhead per task (loads Python + Django each time)
  • 50-200MB memory per subprocess
  • No concurrency control (must build a limiter yourself)
  • No task persistence or result tracking (must track status in the model)
  • Orphan process risk if commands hang
  • No visibility into running tasks without custom tooling

Option 3: Django-Q2 (ORM/PostgreSQL backend)

Third-party task queue (actively maintained fork of django-q). Can use the Django ORM as a broker, so no Redis needed.

Infrastructure: +1 Docker service (worker running manage.py qcluster)

Pros:

  • No Redis needed, uses existing PostgreSQL as broker
  • Best Django admin integration of all options (tasks, results, failures, and schedules all visible)
  • Built-in retry with configurable timeout
  • Supports task chains, schedules, result tracking
  • Mature codebase (original django-q is ~8 years old, Q2 fork is actively maintained)

Cons:

  • Third-party package, not on the Django standardization path (Django chose DEP 14 instead)
  • Multiprocessing model uses more memory per worker than threading
  • Smaller community than Celery
  • API uses string-based task references ("scanning.tasks.compress_pdf") rather than typed function calls

Option 4: Huey + Redis

Lightweight task queue designed as a simpler Celery alternative. Requires Redis as a broker.

Infrastructure: +2 Docker services (Redis container + worker running manage.py run_huey)

Pros:

  • Clean, simple decorator API, less boilerplate than Celery
  • Single package with built-in Django integration
  • Supports periodic tasks, task pipelines, task locking
  • Lightweight and fast
  • Well-maintained (10+ years, single primary author)

Cons:

  • Requires Redis (new infrastructure dependency we don't currently have)
  • Smaller ecosystem, no built-in web dashboard
  • Single-process consumer model limits horizontal scaling

Option 5: Celery + Redis

The industry-standard distributed task queue for Python/Django.

Infrastructure: +2 Docker services (Redis + Celery worker). Optional +1 for Celery Beat (scheduled tasks).

Pros:

  • Most mature and well-documented option (15+ years, used by Instagram, Mozilla, Adyen)
  • Excellent retry support: automatic retries, exponential backoff, dead letter queues
  • Excellent monitoring: Flower web dashboard, django-celery-results admin
  • Handles complex workflows (chains, groups, chords)
  • Easy to scale horizontally

Cons:

  • Requires Redis (new infrastructure dependency)
  • Most boilerplate to set up (celery.py, broker config, worker management)
  • Overkill for our low-volume, staff-triggered workload
  • Configuration can be finicky (serializers, timezones, result backends)

Comparison

Criteria django.tasks subprocess + CronJob Django-Q2 Huey Celery
New deps 1 (django-tasks) 0 1 (django-q2) 1 (huey) 2-3
New Docker services +1 (worker) 0 +1 (worker) +2 (Redis + worker) +2-3 (Redis + worker + beat)
Requires Redis No No No Yes Yes
Task persistence Yes (DB) No Yes (DB) Yes (Redis) Yes (Redis)
Retry support Manual (CronJob) Manual Built-in Built-in Excellent
Admin UI Basic None Best Basic Good (Flower)
Setup complexity Low Trivial Low Low-medium Medium-high
Maturity New (~1 year) N/A Good (~8 years) Good (10+ years) Excellent (15+ years)

Recommendation: django.tasks + django-tasks DatabaseBackend

For our workload (staff-triggered, low volume, ~10 users, PostgreSQL already in place, no Redis), django.tasks is the best fit:

  1. No Redis. We already have PostgreSQL and don't need a new infrastructure dependency for our throughput level
  2. First-party API. django.tasks is in Django core. Task code is portable across backends, so if we ever need to swap to a Redis-backed backend, we change one setting
  3. Minimal overhead. Only one extra container running the same Django image with a different command
  4. Future-proof. This is where Django is heading (DEP 14). Investing in this API now means we're aligned with the ecosystem as it matures
  5. The retry gap is solvable. A management command + K8s CronJob that resets stuck tasks every 15 minutes gives us effective retry behavior. This pattern works regardless of which backend we choose
  6. Right-sized. Celery and Huey solve problems we don't have (high throughput, complex routing, distributed workers). Django-Q2 is solid but is a third-party library betting against Django's own standardization direction. subprocess works but has no persistence or visibility

If the ecosystem proves too young or we hit limitations, Django-Q2 is the natural fallback (also PostgreSQL-based, no Redis, good admin UI).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions