Skip to content

r-xue/pclean

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pclean — Parallel CLEAN Imaging with Dask

tests codecov docs

pclean is a modular, Dask-accelerated radio-interferometric imaging package that wraps CASA's synthesis imaging C++ tools (casatools) to provide transparent parallelism for cube (channel-distributed) and continuum (row-distributed) imaging workflows.

Features

Feature Description
Cube parallelism Channels are distributed across Dask workers; each worker runs a complete imaging and deconvolution cycle on its sub-cube.
Continuum parallelism Visibility rows are partitioned across Dask workers for major-cycle gridding; minor cycles run on the gathered, normalized image.
tclean-compatible API Drop-in pclean() function accepting the same parameters as CASA tclean.
Hierarchical config Pydantic v2 YAML-based configuration with presets, layered merging, and CASA bridge methods.
CLI support Run imaging from the command line via python -m pclean.
SLURM clusters Native Dask-Jobqueue integration for HPC batch scheduling.
Modular internals Every building block — imager, deconvolver, normalizer, partitioner, cluster manager — is independently importable.
ADIOS2 support Convert MeasurementSet columns to Adios2StMan for I/O benchmarking. Requires the casatools openmpi variant from conda-forge.

Quick start

from pclean import pclean

# Parallel cube imaging (channels distributed across workers)
pclean(
    vis='my.ms',
    imagename='cube_out',
    specmode='cube',
    imsize=[512, 512],
    cell='1arcsec',
    niter=1000,
    deconvolver='hogbom',
    parallel=True,
    nworkers=8,
    cube_chunksize=1,       # one sub-cube per channel (max parallelism)
)

# Parallel continuum imaging (visibility rows chunked)
pclean(
    vis='my.ms',
    imagename='cont_out',
    specmode='mfs',
    imsize=[2048, 2048],
    cell='0.5arcsec',
    niter=5000,
    deconvolver='mtmfs',
    nterms=2,
    parallel=True,
    nworkers=4,
)

Command-line interface

python -m pclean --vis my.ms --imagename out --specmode cube \
    --imsize 512 512 --cell 1arcsec --niter 1000 \
    --parallel --nworkers 8

Additional parameters

Beyond the standard tclean parameters, pclean accepts:

Parameter Default Description
parallel False Enable Dask-distributed parallelism.
nworkers None Number of Dask workers. None defaults to the available CPU count.
scheduler_address None Address of an existing Dask scheduler; when set, no local cluster is created.
threads_per_worker 1 Threads per Dask worker. Kept at 1 because CASA tools are not thread-safe.
memory_limit '0' Per-worker memory cap. '0' disables Dask memory management, preventing CASA C++ allocations from being paused or killed.
local_directory None Scratch directory for Dask spill-to-disk.
cube_chunksize -1 Channels per sub-cube task. -1 assigns one sub-cube per worker; 1 assigns one per channel.
keep_subcubes False Retain intermediate sub-cube images after concatenation.
keep_partimages False Retain partial images after continuum gather.
concat_mode 'auto' Concatenation strategy: 'auto' (derive from keep_subcubes), 'paged' (physical copy), 'virtual' (reference catalog), 'movevirtual' (rename into output).

Architecture

pclean/
├── src/pclean/
│   ├── __init__.py                # Package init, exposes pclean()
│   ├── __main__.py                # CLI entry point (python -m pclean)
│   ├── pclean.py                  # Top-level tclean-like interface
│   ├── params.py                  # Parameter container & validation
│   ├── imaging/
│   │   ├── serial_imager.py       # Single-process imager (base engine)
│   │   ├── deconvolver.py         # Deconvolution wrapper
│   │   └── normalizer.py          # Image normalization (gather/scatter)
│   ├── parallel/
│   │   ├── cluster.py             # Dask cluster lifecycle management
│   │   ├── cube_parallel.py       # Channel-parallel cube imaging
│   │   ├── continuum_parallel.py  # Row-parallel continuum imaging
│   │   └── worker_tasks.py        # Serialisable functions for workers
│   └── utils/
│       ├── partition.py           # Data / image partitioning helpers
│       ├── image_concat.py        # Sub-cube image concatenation
│       ├── memory_estimate.py     # Worker RAM estimation heuristics
│       ├── check_adios2.py        # Adios2StMan availability check
│       └── convert_adios2.py      # MS → ADIOS2 conversion utility

Documentation

Full documentation is hosted at pclean.readthedocs.io.

Requirements

  • Python ≥ 3.10
  • casatools ≥ 6.5
  • dask + distributed
  • numpy
  • pydantic ≥ 2.0

Pixi environments

The project uses pixi for reproducible environment management. Four environments are defined in pyproject.toml:

Environment Features Description
default casa Runtime with casatools/casatasks from PyPI.
default-forge casa-forge Runtime with casatools/casatasks from conda-forge (includes the openmpi variant required for Adios2StMan).
dev casa, dev Runtime plus pytest, pytest-cov, and ruff.
test dev Linting and testing only (no casatools).

Common tasks are exposed as pixi scripts:

pixi run -e dev test          # pytest -v
pixi run -e dev test-cov      # pytest with coverage
pixi run -e dev lint          # ruff check
pixi run -e dev fmt           # ruff format

References and acknowledgements

pclean builds on the imaging and calibration infrastructure developed by the CASA team at NRAO / ESO / NAOJ. The scientific algorithms — gridding, deconvolution, self-calibration — are the product of decades of CASA development; pclean is purely a computing-engineering effort that re-orchestrates those mature tools with a modern distributed runtime.

If this package contributes to published research, please cite the CASA software:

CASA Team, Bean, B., Bhatnagar, S., et al. 2022, "CASA, the Common Astronomy Software Applications for Radio Astronomy," PASP, 134, 114501. doi:10.1088/1538-3873/ac9642

McMullin, J. P., Waters, B., Schiebel, D., Young, W., & Golap, K. 2007, "CASA Architecture and Applications," ASP Conf. Ser., 376, 127. ads:2007ASPC..376..127M

Relation to CASA's built-in parallel imaging

pclean's parallel design closely follows the Python orchestration layer that CASA's tclean task already provides through the casatasks.private.imagerhelpers module:

CASA Python class pclean equivalent role
PySynthesisImager SerialImager serial imaging loop (init → PSF → major/minor → restore)
PyParallelCubeSynthesisImager ParallelCubeImager each worker runs an independent SerialImager on a frequency sub-cube
PyParallelContSynthesisImager ParallelContinuumImager row-partitioned gridding across workers; minor cycles run serially on the coordinator
PyParallelImagerHelper DaskClusterManager cluster lifecycle, job dispatch, and result collection

The structural decomposition is the same: partition → image → normalize → deconvolve → iterate, with the same split between embarrassingly-parallel cube channels and gather/scatter continuum cycles. Both code-bases use polymorphic dispatch — task_tclean.py picks between PySynthesisImager, PyParallelCubeSynthesisImager, or PyParallelContSynthesisImager based on specmode and MPI availability; pclean makes the same choice based on its own parallel and is_cube flags.

The key difference is the parallelism transport. CASA's PyParallelImagerHelper sends Python code strings to MPI workers via casampi.MPIInterface, requiring mpicasa and a shared filesystem. pclean replaces this with Dask Distributed futures and actors, eliminating the MPI dependency in exchange for Dask scheduling overhead.

See also CASA Memo 13 (Sekhar, Rau & Xue 2024) for benchmarking of per-channel cube imaging distributed via SLURM job arrays that motivated this work (benchmarking scripts).

License

Copyright 2026 the pclean authors.

GPL-3.0-or-later — see LICENSE for details.

Disclaimer

This project is an independent, personal effort developed on the authors' own time. It is not affiliated with, endorsed by, or conducted as part of any employer's projects or responsibilities.

AI Disclosure

This project was developed with the assistance of AI coding agents (GitHub Copilot, Claude). The AI contributed to code generation, debugging, and documentation under human direction and review.

About

Mi CASA es tu CASA / Tu vibra es mi vibra

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages