Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ from slide2vec import ExecutionOptions, Pipeline, PreprocessingConfig
pipeline = Pipeline(
model=model,
preprocessing=PreprocessingConfig(
target_spacing_um=0.5,
target_tile_size_px=224,
requested_spacing_um=0.5,
requested_tile_size_px=224,
tissue_threshold=0.1,
),
execution=ExecutionOptions(output_dir="outputs/demo"),
Expand All @@ -62,8 +62,8 @@ Tile embeddings can be spatially grouped into regions for downstream models that

```python
preprocessing = PreprocessingConfig(
target_spacing_um=0.5,
target_tile_size_px=224,
requested_spacing_um=0.5,
requested_tile_size_px=224,
region_tile_multiple=6, # 6x6 tiles per region
)
embedded = model.embed_slide("/path/to/slide.svs", preprocessing=preprocessing)
Expand Down
50 changes: 25 additions & 25 deletions docs/python-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ from slide2vec import Model, PreprocessingConfig
model = Model.from_preset("virchow2")
preprocessing = PreprocessingConfig(
backend="auto",
target_spacing_um=0.5,
target_tile_size_px=224,
requested_spacing_um=0.5,
requested_tile_size_px=224,
tissue_threshold=0.1,
segmentation={"downsample": 64},
filtering={"ref_tile_size": 224},
Expand All @@ -65,15 +65,15 @@ embedded = model.embed_slide("/path/to/slide.svs", preprocessing=preprocessing)

Common fields:

- `target_spacing_um`
- `target_tile_size_px`
- `requested_spacing_um`
- `requested_tile_size_px`
- `tissue_threshold`
- `backend` `"auto"`, `"cucim"`, `"openslide"`, `"vips"`, or `"asap"`
- `on_the_fly` read tiles directly from WSI during embedding (default `True`)
- `use_supertiles` group tiles into spatial blocks to reduce WSI read calls (default `True`)
- `read_coordinates_from` reuse pre-extracted coordinates
- `read_tiles_from` reuse pre-extracted tile tar archives
- `resume` resume from a previous tiling run (default `False`)
- `backend` - `"auto"`, `"cucim"`, `"openslide"`, `"vips"`, or `"asap"`
- `on_the_fly` - read tiles directly from WSI during embedding (default `True`)
- `use_supertiles` - group tiles into spatial blocks to reduce WSI read calls (default `True`)
- `read_coordinates_from` - reuse pre-extracted coordinates
- `read_tiles_from` - reuse pre-extracted tile tar archives
- `resume` - resume from a previous tiling run (default `False`)
- `preview`

For hierarchical extraction, see the [dedicated section](#hierarchical-feature-extraction) below.
Expand All @@ -100,15 +100,15 @@ Common fields:

- `batch_size`
- `num_gpus`
- `precision` `"fp16"`, `"bf16"`, `"fp32"`, or `None` (auto-determined from model)
- `num_workers` DataLoader workers (`None` means auto; this resolves to the job CPU budget, capped by SLURM and 64, except cuCIM on-the-fly mode derives `cpu_budget // num_cucim_workers`)
- `num_preprocessing_workers` hs2p tiling workers (default: all CPUs available to the job, capped by SLURM when present and limited to 64)
- `prefetch_factor` DataLoader prefetch factor (default `4`)
- `persistent_workers` keep DataLoader workers alive across batches (default `True`)
- `precision` - `"fp16"`, `"bf16"`, `"fp32"`, or `None` (auto-determined from model)
- `num_workers` - DataLoader workers (`None` means auto; this resolves to the job CPU budget, capped by SLURM and 64, except cuCIM on-the-fly mode derives `cpu_budget // num_cucim_workers`)
- `num_preprocessing_workers` - hs2p tiling workers (default: all CPUs available to the job, capped by SLURM when present and limited to 64)
- `prefetch_factor` - DataLoader prefetch factor (default `4`)
- `persistent_workers` - keep DataLoader workers alive across batches (default `True`)
- `output_dir`
- `output_format` `"pt"` (default) or `"npz"`
- `save_tile_embeddings` persist tile embeddings for slide-level models (default `False`)
- `save_latents` persist latent representations when available (default `False`)
- `output_format` - `"pt"` (default) or `"npz"`
- `save_tile_embeddings` - persist tile embeddings for slide-level models (default `False`)
- `save_latents` - persist latent representations when available (default `False`)

`num_gpus` defaults to all available GPUs. `embed_slide(...)` uses tile sharding for one slide, and `embed_slides(...)` balances whole slides across GPUs while preserving input order.

Expand All @@ -125,19 +125,19 @@ from slide2vec import Model, PreprocessingConfig

model = Model.from_preset("virchow2")
preprocessing = PreprocessingConfig(
target_spacing_um=0.5,
target_tile_size_px=224,
requested_spacing_um=0.5,
requested_tile_size_px=224,
region_tile_multiple=6, # 6x6 tiles per region
)
embedded = model.embed_slide("/path/to/slide.svs", preprocessing=preprocessing)
```

Config fields:

- `region_tile_multiple` region grid width/height in tiles (e.g., `6` means 6x6 = 36 tiles per region; must be >= 2)
- `target_region_size_px` — explicit parent region size in pixels; auto-derived from `target_tile_size_px * region_tile_multiple` if omitted
- `region_tile_multiple` - region grid width/height in tiles (e.g., `6` means 6x6 = 36 tiles per region; must be >= 2)
- `requested_region_size_px` - explicit parent region size in pixels; auto-derived from `requested_tile_size_px * region_tile_multiple` if omitted

When the selected read spacing differs from `target_spacing_um`, hierarchical extraction resolves effective geometry tile-first: it scales `target_tile_size_px` to the effective read spacing, then derives the effective parent region as `effective_tile_size_px * region_tile_multiple`. This keeps unrolled subtile geometry aligned with the model-facing tile size contract under spacing-driven rounding.
When the selected read spacing differs from `requested_spacing_um`, hierarchical extraction resolves geometry tile-first: it scales `requested_tile_size_px` to the read spacing, then derives the read parent region as `read_tile_size_px * region_tile_multiple`. This keeps unrolled subtile geometry aligned with the model-facing tile size contract under spacing-driven rounding.

When persisted via `Pipeline`, hierarchical artifacts are written to `hierarchical_embeddings/` and `RunResult` includes a `hierarchical_artifacts` list.

Expand All @@ -152,8 +152,8 @@ from slide2vec import ExecutionOptions, Model, Pipeline, PreprocessingConfig

model = Model.from_preset("virchow2")
preprocessing = PreprocessingConfig(
target_spacing_um=0.5,
target_tile_size_px=224,
requested_spacing_um=0.5,
requested_tile_size_px=224,
tissue_threshold=0.1,
)
pipeline = Pipeline(
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ classifiers = [
"Programming Language :: Python :: 3.13",
]
dependencies = [
"hs2p[asap,cucim,openslide,vips]>=3.1.4",
"hs2p[asap,cucim,openslide,vips]>=3.2.0",
"omegaconf",
"matplotlib",
"numpy<2",
Expand Down Expand Up @@ -85,7 +85,7 @@ fm = [
"pandas",
"pillow",
"rich",
"hs2p[asap,cucim,openslide,vips]>=3.1.4",
"hs2p[asap,cucim,openslide,vips]>=3.2.0",
"wandb",
"torch>=2.3,<2.8",
"torchvision>=0.18.0",
Expand Down
4 changes: 2 additions & 2 deletions scripts/benchmark_embedding_throughput.py
Original file line number Diff line number Diff line change
Expand Up @@ -787,8 +787,8 @@ def _build_model_pipeline_from_config(config: dict[str, Any]):
preview = dict(tiling_cfg.get("preview", {}))
preprocessing = PreprocessingConfig(
backend=str(tiling_cfg.get("backend", "asap")),
target_spacing_um=float(params.get("target_spacing_um", 0.5)),
target_tile_size_px=int(params.get("target_tile_size_px", 224)),
requested_spacing_um=float(params.get("requested_spacing_um", 0.5)),
requested_tile_size_px=int(params.get("requested_tile_size_px", 224)),
tolerance=float(params.get("tolerance", 0.05)),
overlap=float(params.get("overlap", 0.0)),
tissue_threshold=float(params.get("tissue_threshold", 0.01)),
Expand Down
8 changes: 4 additions & 4 deletions scripts/benchmark_end_to_end_paths.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,9 +227,9 @@ def _default_base_config(
"read_coordinates_from": None,
"read_tiles_from": None,
"params": {
"target_spacing_um": 0.5,
"requested_spacing_um": 0.5,
"tolerance": 0.05,
"target_tile_size_px": 256,
"requested_tile_size_px": 256,
"overlap": 0.0,
"tissue_threshold": 0.01,
"drop_holes": False,
Expand Down Expand Up @@ -440,8 +440,8 @@ def _build_pipeline_from_config_dict(config: dict[str, Any]):

preprocessing = PreprocessingConfig(
backend=str(tiling_cfg.get("backend", "cucim")),
target_spacing_um=float(params.get("target_spacing_um", 0.5)),
target_tile_size_px=int(params.get("target_tile_size_px", 256)),
requested_spacing_um=float(params.get("requested_spacing_um", 0.5)),
requested_tile_size_px=int(params.get("requested_tile_size_px", 256)),
tolerance=float(params.get("tolerance", 0.05)),
overlap=float(params.get("overlap", 0.0)),
tissue_threshold=float(params.get("tissue_threshold", 0.01)),
Expand Down
8 changes: 4 additions & 4 deletions scripts/benchmark_tile_read_strategies.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,9 +281,9 @@ def _default_base_config(
"read_coordinates_from": None,
"read_tiles_from": None,
"params": {
"target_spacing_um": 0.5,
"requested_spacing_um": 0.5,
"tolerance": 0.05,
"target_tile_size_px": 224,
"requested_tile_size_px": 224,
"overlap": 0.0,
"tissue_threshold": 0.1,
"drop_holes": False,
Expand Down Expand Up @@ -510,8 +510,8 @@ def _build_pipeline_from_config_dict(config: dict[str, Any]):

preprocessing = PreprocessingConfig(
backend=str(tiling_cfg.get("backend", "cucim")),
target_spacing_um=float(params.get("target_spacing_um", 0.5)),
target_tile_size_px=int(params.get("target_tile_size_px", 256)),
requested_spacing_um=float(params.get("requested_spacing_um", 0.5)),
requested_tile_size_px=int(params.get("requested_tile_size_px", 256)),
tolerance=float(params.get("tolerance", 0.05)),
overlap=float(params.get("overlap", 0.0)),
tissue_threshold=float(params.get("tissue_threshold", 0.01)),
Expand Down
4 changes: 2 additions & 2 deletions scripts/generate_gt.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@

# Must stay in sync with test_output_consistency.py
TILING_PARAMS = dict(
target_spacing_um=0.5,
requested_spacing_um=0.5,
tolerance=0.07,
target_tile_size_px=224,
requested_tile_size_px=224,
overlap=0.0,
tissue_threshold=0.1,
drop_holes=False,
Expand Down
70 changes: 35 additions & 35 deletions slide2vec/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ class SlideLike(Protocol):
@dataclass(frozen=True, kw_only=True)
class PreprocessingConfig:
backend: str = "auto"
target_spacing_um: float | None = None
target_tile_size_px: int | None = None
target_region_size_px: int | None = None
requested_spacing_um: float | None = None
requested_tile_size_px: int | None = None
requested_region_size_px: int | None = None
region_tile_multiple: int | None = None
tolerance: float = 0.05
overlap: float = 0.0
Expand Down Expand Up @@ -75,11 +75,11 @@ def from_config(cls, cfg: Any) -> "PreprocessingConfig":
preview_downsample = int(preview_cfg.downsample)
return cls(
backend=tiling.backend,
target_spacing_um=float(tiling.params.target_spacing_um),
target_tile_size_px=int(tiling.params.target_tile_size_px),
target_region_size_px=(
requested_spacing_um=float(tiling.params.requested_spacing_um),
requested_tile_size_px=int(tiling.params.requested_tile_size_px),
requested_region_size_px=(
int(v)
if (v := getattr(tiling.params, "target_region_size_px", None)) is not None
if (v := getattr(tiling.params, "requested_region_size_px", None)) is not None
else None
),
region_tile_multiple=(
Expand Down Expand Up @@ -454,28 +454,28 @@ def ensure_defaults() -> tuple[int, float]:
return defaults

if preprocessing is None:
target_tile_size_px, target_spacing_um = ensure_defaults()
requested_tile_size_px, requested_spacing_um = ensure_defaults()
return _resolve_hierarchical_preprocessing(
PreprocessingConfig(
backend="auto",
target_spacing_um=target_spacing_um,
target_tile_size_px=target_tile_size_px,
requested_spacing_um=requested_spacing_um,
requested_tile_size_px=requested_tile_size_px,
)
)

target_spacing_um = preprocessing.target_spacing_um
target_tile_size_px = preprocessing.target_tile_size_px
if target_spacing_um is None or target_tile_size_px is None:
requested_spacing_um = preprocessing.requested_spacing_um
requested_tile_size_px = preprocessing.requested_tile_size_px
if requested_spacing_um is None or requested_tile_size_px is None:
default_tile_size_px, default_spacing_um = ensure_defaults()
if target_spacing_um is None:
target_spacing_um = default_spacing_um
if target_tile_size_px is None:
target_tile_size_px = default_tile_size_px
if requested_spacing_um is None:
requested_spacing_um = default_spacing_um
if requested_tile_size_px is None:
requested_tile_size_px = default_tile_size_px
return _resolve_hierarchical_preprocessing(
replace(
preprocessing,
target_spacing_um=target_spacing_um,
target_tile_size_px=target_tile_size_px,
requested_spacing_um=requested_spacing_um,
requested_tile_size_px=requested_tile_size_px,
)
)

Expand All @@ -484,7 +484,7 @@ def _default_preprocessing_from_registry(name: str | None) -> tuple[int, float]:
if not name or name not in encoder_registry:
raise ValueError(
"Cannot infer preprocessing defaults without a registered model. "
"Pass preprocessing.target_spacing_um and preprocessing.target_tile_size_px explicitly."
"Pass preprocessing.requested_spacing_um and preprocessing.requested_tile_size_px explicitly."
)

defaults = resolve_preprocessing_defaults(name)
Expand All @@ -499,7 +499,7 @@ def _validate_model_config(
name = model.name
if name not in encoder_registry:
return
if preprocessing.region_tile_multiple is not None or preprocessing.target_region_size_px is not None:
if preprocessing.region_tile_multiple is not None or preprocessing.requested_region_size_px is not None:
info = encoder_registry.info(name)
if info["level"] != "tile":
raise ValueError("Hierarchical preprocessing is only supported for tile encoders")
Expand All @@ -508,8 +508,8 @@ def _validate_model_config(
precision = None if on_cpu or execution is None else execution.precision
validate_encoder_config(
name,
target_tile_size_px=preprocessing.target_tile_size_px,
target_spacing_um=preprocessing.target_spacing_um,
requested_tile_size_px=preprocessing.requested_tile_size_px,
requested_spacing_um=preprocessing.requested_spacing_um,
precision=precision,
output_variant=model._output_variant,
allow_non_recommended=bool(model.allow_non_recommended_settings),
Expand All @@ -518,32 +518,32 @@ def _validate_model_config(

def _resolve_hierarchical_preprocessing(preprocessing: PreprocessingConfig) -> PreprocessingConfig:
multiple = preprocessing.region_tile_multiple
target_region_size_px = preprocessing.target_region_size_px
requested_region_size_px = preprocessing.requested_region_size_px
if multiple is not None:
multiple = int(multiple)
if multiple < 2:
raise ValueError("region_tile_multiple must be at least 2")
if multiple is None and target_region_size_px is None:
if multiple is None and requested_region_size_px is None:
return preprocessing
if preprocessing.target_tile_size_px is None:
if preprocessing.requested_tile_size_px is None:
raise ValueError(
"target_tile_size_px must be resolved before deriving hierarchical region geometry"
"requested_tile_size_px must be resolved before deriving hierarchical region geometry"
)
if target_region_size_px is None:
target_region_size_px = int(preprocessing.target_tile_size_px) * int(multiple)
if requested_region_size_px is None:
requested_region_size_px = int(preprocessing.requested_tile_size_px) * int(multiple)
elif multiple is None:
if int(target_region_size_px) % int(preprocessing.target_tile_size_px) != 0:
if int(requested_region_size_px) % int(preprocessing.requested_tile_size_px) != 0:
raise ValueError(
"target_region_size_px must be an exact multiple of target_tile_size_px"
"requested_region_size_px must be an exact multiple of requested_tile_size_px"
)
multiple = int(target_region_size_px) // int(preprocessing.target_tile_size_px)
elif int(target_region_size_px) != int(preprocessing.target_tile_size_px) * int(multiple):
multiple = int(requested_region_size_px) // int(preprocessing.requested_tile_size_px)
elif int(requested_region_size_px) != int(preprocessing.requested_tile_size_px) * int(multiple):
raise ValueError(
"target_region_size_px must match target_tile_size_px * region_tile_multiple"
"requested_region_size_px must match requested_tile_size_px * region_tile_multiple"
)
return replace(
preprocessing,
target_region_size_px=int(target_region_size_px),
requested_region_size_px=int(requested_region_size_px),
region_tile_multiple=int(multiple),
)

Expand Down
Loading
Loading