pclean is a modular, Dask-accelerated radio-interferometric imaging package
that wraps CASA's synthesis imaging C++ tools (casatools) to provide
transparent parallelism for cube (channel-distributed) and continuum
(row-distributed) imaging workflows.
| Feature | Description |
|---|---|
| Cube parallelism | Channels are distributed across Dask workers; each worker runs a complete imaging and deconvolution cycle on its sub-cube. |
| Continuum parallelism | Visibility rows are partitioned across Dask workers for major-cycle gridding; minor cycles run on the gathered, normalized image. |
| tclean-compatible API | Drop-in pclean() function accepting the same parameters as CASA tclean. |
| Hierarchical config | Pydantic v2 YAML-based configuration with presets, layered merging, and CASA bridge methods. |
| CLI support | Run imaging from the command line via python -m pclean. |
| SLURM clusters | Native Dask-Jobqueue integration for HPC batch scheduling. |
| Modular internals | Every building block — imager, deconvolver, normalizer, partitioner, cluster manager — is independently importable. |
| ADIOS2 support | Convert MeasurementSet columns to Adios2StMan for I/O benchmarking. Requires the casatools openmpi variant from conda-forge. |
from pclean import pclean
# Parallel cube imaging (channels distributed across workers)
pclean(
vis='my.ms',
imagename='cube_out',
specmode='cube',
imsize=[512, 512],
cell='1arcsec',
niter=1000,
deconvolver='hogbom',
parallel=True,
nworkers=8,
cube_chunksize=1, # one sub-cube per channel (max parallelism)
)
# Parallel continuum imaging (visibility rows chunked)
pclean(
vis='my.ms',
imagename='cont_out',
specmode='mfs',
imsize=[2048, 2048],
cell='0.5arcsec',
niter=5000,
deconvolver='mtmfs',
nterms=2,
parallel=True,
nworkers=4,
)python -m pclean --vis my.ms --imagename out --specmode cube \
--imsize 512 512 --cell 1arcsec --niter 1000 \
--parallel --nworkers 8Beyond the standard tclean parameters, pclean accepts:
| Parameter | Default | Description |
|---|---|---|
parallel |
False |
Enable Dask-distributed parallelism. |
nworkers |
None |
Number of Dask workers. None defaults to the available CPU count. |
scheduler_address |
None |
Address of an existing Dask scheduler; when set, no local cluster is created. |
threads_per_worker |
1 |
Threads per Dask worker. Kept at 1 because CASA tools are not thread-safe. |
memory_limit |
'0' |
Per-worker memory cap. '0' disables Dask memory management, preventing CASA C++ allocations from being paused or killed. |
local_directory |
None |
Scratch directory for Dask spill-to-disk. |
cube_chunksize |
-1 |
Channels per sub-cube task. -1 assigns one sub-cube per worker; 1 assigns one per channel. |
keep_subcubes |
False |
Retain intermediate sub-cube images after concatenation. |
keep_partimages |
False |
Retain partial images after continuum gather. |
concat_mode |
'auto' |
Concatenation strategy: 'auto' (derive from keep_subcubes), 'paged' (physical copy), 'virtual' (reference catalog), 'movevirtual' (rename into output). |
pclean/
├── src/pclean/
│ ├── __init__.py # Package init, exposes pclean()
│ ├── __main__.py # CLI entry point (python -m pclean)
│ ├── pclean.py # Top-level tclean-like interface
│ ├── params.py # Parameter container & validation
│ ├── imaging/
│ │ ├── serial_imager.py # Single-process imager (base engine)
│ │ ├── deconvolver.py # Deconvolution wrapper
│ │ └── normalizer.py # Image normalization (gather/scatter)
│ ├── parallel/
│ │ ├── cluster.py # Dask cluster lifecycle management
│ │ ├── cube_parallel.py # Channel-parallel cube imaging
│ │ ├── continuum_parallel.py # Row-parallel continuum imaging
│ │ └── worker_tasks.py # Serialisable functions for workers
│ └── utils/
│ ├── partition.py # Data / image partitioning helpers
│ ├── image_concat.py # Sub-cube image concatenation
│ ├── memory_estimate.py # Worker RAM estimation heuristics
│ ├── check_adios2.py # Adios2StMan availability check
│ └── convert_adios2.py # MS → ADIOS2 conversion utility
Full documentation is hosted at pclean.readthedocs.io.
- Python ≥ 3.10
casatools≥ 6.5dask+distributednumpypydantic≥ 2.0
The project uses pixi for reproducible environment
management. Four environments are defined in pyproject.toml:
| Environment | Features | Description |
|---|---|---|
default |
casa |
Runtime with casatools/casatasks from PyPI. |
default-forge |
casa-forge |
Runtime with casatools/casatasks from conda-forge (includes the openmpi variant required for Adios2StMan). |
dev |
casa, dev |
Runtime plus pytest, pytest-cov, and ruff. |
test |
dev |
Linting and testing only (no casatools). |
Common tasks are exposed as pixi scripts:
pixi run -e dev test # pytest -v
pixi run -e dev test-cov # pytest with coverage
pixi run -e dev lint # ruff check
pixi run -e dev fmt # ruff formatpclean builds on the imaging and calibration infrastructure developed by
the CASA team at NRAO / ESO / NAOJ. The scientific algorithms — gridding,
deconvolution, self-calibration — are the product of decades of CASA
development; pclean is purely a computing-engineering effort that
re-orchestrates those mature tools with a modern distributed runtime.
If this package contributes to published research, please cite the CASA software:
CASA Team, Bean, B., Bhatnagar, S., et al. 2022, "CASA, the Common Astronomy Software Applications for Radio Astronomy," PASP, 134, 114501. doi:10.1088/1538-3873/ac9642
McMullin, J. P., Waters, B., Schiebel, D., Young, W., & Golap, K. 2007, "CASA Architecture and Applications," ASP Conf. Ser., 376, 127. ads:2007ASPC..376..127M
pclean's parallel design closely follows the Python orchestration layer that
CASA's tclean task already provides through the
casatasks.private.imagerhelpers module:
| CASA Python class | pclean equivalent | role |
|---|---|---|
PySynthesisImager |
SerialImager |
serial imaging loop (init → PSF → major/minor → restore) |
PyParallelCubeSynthesisImager |
ParallelCubeImager |
each worker runs an independent SerialImager on a frequency sub-cube |
PyParallelContSynthesisImager |
ParallelContinuumImager |
row-partitioned gridding across workers; minor cycles run serially on the coordinator |
PyParallelImagerHelper |
DaskClusterManager |
cluster lifecycle, job dispatch, and result collection |
The structural decomposition is the same: partition → image → normalize →
deconvolve → iterate, with the same split between embarrassingly-parallel cube
channels and gather/scatter continuum cycles. Both code-bases use polymorphic
dispatch — task_tclean.py picks between PySynthesisImager,
PyParallelCubeSynthesisImager, or PyParallelContSynthesisImager based on
specmode and MPI availability; pclean makes the same choice based on its
own parallel and is_cube flags.
The key difference is the parallelism transport. CASA's
PyParallelImagerHelper sends Python code strings to MPI workers via
casampi.MPIInterface, requiring mpicasa and a
shared filesystem. pclean replaces this with
Dask Distributed futures and actors,
eliminating the MPI dependency in exchange for Dask scheduling overhead.
See also CASA Memo 13 (Sekhar, Rau & Xue 2024) for benchmarking of per-channel cube imaging distributed via SLURM job arrays that motivated this work (benchmarking scripts).
Copyright 2026 the pclean authors.
GPL-3.0-or-later — see LICENSE for details.
This project is an independent, personal effort developed on the authors' own time. It is not affiliated with, endorsed by, or conducted as part of any employer's projects or responsibilities.
This project was developed with the assistance of AI coding agents (GitHub Copilot, Claude). The AI contributed to code generation, debugging, and documentation under human direction and review.