Skip to content

Add staging environment for testing worker scaling on Docker Swarm#315

Open
samuelvkwong wants to merge 113 commits intomainfrom
add-staging-environment
Open

Add staging environment for testing worker scaling on Docker Swarm#315
samuelvkwong wants to merge 113 commits intomainfrom
add-staging-environment

Conversation

@samuelvkwong
Copy link
Copy Markdown
Collaborator

@samuelvkwong samuelvkwong commented Mar 25, 2026

Summary

Adds a staging deployment environment that enables testing worker scaling and Docker Swarm deployment features while maintaining development settings (DEBUG=True, debug toolbar, etc.).

Motivation

Currently, only production can use Docker Swarm for scaling workers. This PR enables developers to:

  • Test worker scaling locally before deploying to production
  • Test Docker Swarm deployment procedures
  • Validate multi-worker concurrent processing behavior
  • All while keeping dev-friendly settings (debug mode, detailed logging, etc.)

Changes

New Files

  • docker-compose.staging.yml - Swarm deployment config with dev settings
  • STAGING_DEPLOYMENT.md - Comprehensive deployment and usage guide

Modified Files

  • example.env - Added staging port variables to avoid conflicts

Key Features

Dev settings on Swarm: DEBUG=True, debug toolbar, console email
Different ports: Runs on localhost:8001 (dev uses 8000)
Pre-scaled workers: Starts with 3 mass_transfer_workers by default
No config needed: Uses existing .env file
All three can run together: Dev, staging, and production simultaneously

Port Allocation

No conflicts - all three environments can run at the same time:

Service Dev Staging Production
Web UI 8000 8001 80, 443
PostgreSQL 5432 5433 (internal)
DICOM Receiver 11122 11123 11112
Orthanc1 7501 7503 (internal)
Orthanc2 7502 7504 (internal)

Usage

# Deploy staging (uses existing .env)
docker swarm init
docker stack deploy -c docker-compose.base.yml -c docker-compose.staging.yml adit-staging

# Access at http://localhost:8001

# Scale workers
docker service scale adit-staging_mass_transfer_worker=5

# Cleanup
docker stack rm adit-staging

Testing Checklist

  • Verify staging deploys successfully
  • Confirm accessible on http://localhost:8001
  • Test worker scaling with docker service scale
  • Verify all three environments can run simultaneously
  • Confirm debug toolbar visible in staging

See STAGING_DEPLOYMENT.md for full documentation.

Summary by CodeRabbit

  • New Features

    • Mass Transfer: bulk DICOM export with partitioned date ranges, JSON filter editor (CodeMirror), pseudonymization modes, optional DICOM→NIfTI conversion, job/task UI, volume listing, CSV export, background worker/queue with scaling, and preference persistence.
  • Bug Fixes / Reliability

    • Improved task cancellation, DIMSE/connection cleanup, safer handling of empty/partial transfer responses, task timeout/cleanup hooks, and worker restart/resilience.
  • Documentation

    • Staging deployment guide and full Mass Transfer specification added.
  • Tests

    • Extensive tests for partitions, pseudonymization, processor behavior, forms, and cleanup.

NumericalAdvantage and others added 30 commits February 18, 2026 16:06
The recursive split used `mid` as the boundary for both halves, causing
studies at the midpoint to appear in both. Additionally, since the DICOM
query operates at date-level granularity, same-day splits produced
identical queries returning the same results in both halves.

Shift the right half to start at mid + 1s and deduplicate by
StudyInstanceUID when merging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
One task per partition handles discovery + export + convert on a separate
mass_transfer queue. Includes review fixes and comprehensive test suite.
… actual reason, improve status message and fix previous jobs from not being processed if the worker processing them was dead
…tric folder structure, and two-mode pseudonymization
…mpty retrievals

Reverts the C-MOVE preference since C-MOVE requires PACS-side AE
registration. Restores study/series pseudonymized UID fields on
MassTransferVolume and populates them after DICOM anonymization.
Tracks image count from C-GET to skip series with 0 retrieved images
instead of passing empty directories to dcm2niix.
Restores 0009 (which removed the fields) and adds 0010 to re-add them,
since 0009 already ran on existing databases.
… C-GET

IMPAX PACS stores most images in compressed formats (JPEG Lossless, JPEG 2000,
etc.) but our storage contexts only offered uncompressed transfer syntaxes.
During C-GET, the PACS could not send images back because no matching transfer
syntax was negotiated, resulting in Success status with 0 images delivered.
The previous commit added compressed transfer syntaxes to
StoragePresentationContexts, but dimse_connector.py was only passing
cx.abstract_syntax to add_requested_context (which defaults to
uncompressed-only). Now passes cx.transfer_syntax so the PACS can
actually send images in JPEG Lossless and other compressed formats.
medihack and others added 30 commits March 27, 2026 22:27
Avoid cancelling the fetch task after the sentinel has been received,
ensuring exceptions from the completed fetch function propagate instead
of being swallowed by cancellation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…b FK

Procrastinate deletes finished jobs via a raw SQL stored procedure
(procrastinate_finish_job_v1), which bypasses Django's ORM-level
on_delete=SET_NULL handling. This caused a ForeignKeyViolation when
the worker tried to finalize completed MassTransferTasks, because the
database-level FK constraint on queued_job_id defaulted to NO ACTION.

The other apps (batch_transfer, selective_transfer, batch_query) already
had a RunSQL migration step calling procrastinate_on_delete_sql() to set
ON DELETE SET NULL at the PostgreSQL level. The mass_transfer app was
missing this step because the queued_job field was included in the
initial CreateModel migration rather than added later via AddField.

This resulted in an infinite retry loop: each task completed its DICOM
work successfully, but Procrastinate could not delete the job row, so
the task was re-picked and re-processed indefinitely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously mass transfer only supported folder destinations. This adds
support for server destinations by downloading series to a temp directory
and uploading via the destination operator. NIfTI conversion is disabled
for server destinations both in the form (Alpine.js checkbox disabling)
and backend (clean method clears the flag).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CSS overrides scoped to [data-bs-theme="dark"] so the JSON filter
editor automatically matches the site-wide theme toggle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nsfer

Create MassTransferVolume records in PENDING state immediately after
discovery so they appear in the UI while the task is running. Each
volume is updated in place during transfer and saved via a finally
block that guarantees persistence even on RetriableDicomError.

Key changes:
- Remove SeriesTransferResult dataclass; transfer methods now update
  MassTransferVolume fields directly instead of returning result objects
- Replace _group_series() with _create_pending_volumes() + _group_volumes()
- Remove _create_volume_record(); volumes are bulk-created upfront
- Add persistent parameter to DicomOperator to abstract over
  DimseConnector.auto_close
- Move destination setup (dest_operator/output_base) before try block
  with assert ensuring exactly one is set
- Add defensive PENDING check and safe volume.save() in finally block
- Replace protocol-specific comments (C-GET/C-FIND/DIMSE) with
  protocol-agnostic terms (fetch/query)
- Restyle task detail template with Back to Job button and pagination
- Add test fixture and _fake_export_success helper to reduce boilerplate

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Defers individual task queuing to a background Procrastinate job on the
default queue so that the HTTP view returns immediately instead of
blocking on potentially thousands of defer() calls. Adds factories,
a dedicated tasks module, and comprehensive tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace unlimited process timeout (None) with a 24-hour cap to prevent
runaway tasks from blocking the mass_transfer queue indefinitely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix task reset routing: override queue_pending_task() on MassTransferTask
  so single-task resets use the mass_transfer queue instead of the generic
  dicom queue with wrong timeout
- Deduplicate queue logic: queue_mass_transfer_tasks now delegates to
  task.queue_pending_task() instead of duplicating the defer/save code
- Add error handling to task queueing loop with logging and re-raise
- Narrow bare except clauses in _extract_dicom_metadata and
  _write_dicom_metadata to specific exception types with logging
- Move close() error guard from processor into DicomOperator.close()
- Add debug logging to silent except OSError: pass blocks
- Remove unused cleanup_on_failure hook from DicomTask and all call sites
- Add model-level clean() to MassTransferJob for date range validation
- Add CheckConstraint ensuring partition_start < partition_end
- Add MinValueValidator(1) to DicomServer.max_search_results
- Make FilterSpec and DiscoveredSeries frozen dataclasses
- Fix dimse_connector docstring referencing non-existent auto_config
- Add docstring to build_partitions(), expand filters_json help_text
- Sync inherited TransferTask field validators in migration
- Fix test fixtures to satisfy new partition constraint
- Add 10 new tests: filter JSON validation, discover_series filtering,
  date range rejection, source authorization
- Delete outdated mass_transfer_spec.md
- Document assertion usage policy in CLAUDE.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace plain string identifiers with realistic DICOM UID format
in test helper calls for consistency with actual DICOM data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace run_worker_once() with direct queue_mass_transfer_tasks() calls
in the queue tests. The worker spawned pebble daemon subprocesses that
left PostgreSQL connections open, causing a teardown warning when running
the full test suite. The queueing logic is now tested directly while
task processing is already covered by core/tests/test_tasks.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unnecessary SHA hash suffix from study folder names since studies
are already nested under patient folders, and include series description
in the fallback when series number is missing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Decouple pseudonym computation from dicognito's IDAnonymizer to ensure
stability across library upgrades. The new standalone compute_pseudonym
function uses SHA-256 with divmod extraction into A-Z0-9, matching
dicognito's approach but with a stronger hash. Deterministic pseudonyms
are 14 chars, random pseudonyms are 15 chars so the two modes can be
distinguished by length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ce tests

Instead of raising an error when switching DICOM services (e.g. C-FIND
to C-GET) on a persistent connection, the DimseConnector now
transparently closes the old association and opens a new one. This
simplifies callers that need to chain different operations.

Also adds Playwright acceptance tests covering pseudonymized,
unpseudonymized, folder-destination, and NIfTI-conversion mass
transfers across all three transfer protocols (C-MOVE, C-GET, DICOMweb).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The procrastinate worker thread running queue_mass_transfer_tasks opens
a Django DB connection (via MassTransferTask.save() in queue_pending_task)
that was never closed, since worker threads don't go through Django's
request/response lifecycle. This left an idle PostgreSQL session that
blocked test database teardown, causing a PytestWarning.

Add db.close_old_connections() in a finally block, matching the pattern
already used in core/tasks.py _run_dicom_task.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nvironment

# Conflicts:
#	adit/mass_transfer/processors.py
#	adit/mass_transfer/tests/test_processor.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cap filters JSON display at 20 lines with overflow scroll
- Clear queued_job_id before deleting procrastinate job on cancel
Add a new filter field that allows excluding series with fewer than
a specified number of instances. Includes form validation (must be >= 1),
processor filtering logic, and comprehensive tests for both.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows parsing semicolon-delimited CSVs (common in European Excel exports)
by passing e.g. --delimiter ";". Defaults to comma for backwards compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants