MicroSplit inputs generation MWE by CatEek · Pull Request #440 · CAREamics/careamics

CatEek · 2025-03-13T16:20:14Z

Description

Background - why do we need this PR?

Adds functionality needed for MicroSplit into the next-gen dataset.

Overview - what changed?

Adds pydantic config for PatchExtractor with corresponding changes to DataConfig
Changes PatchSpecsGenerator adding the logic to extract patches for different locations
Changes ImageStack class adding the logic for extracting patches for different locations for each channel, LC, etc.

New files

PatchExtractorConfig added to src/careamics/config/data/patch_extractor.py
_get_coords added to RandomPatchSpecsGenerator.generate in src/careamics/dataset_ng/patching_strategies/patch_specs_generator..py
_get_patches, _get_lc_patches and _composite_patch methods added in src/careamics/dataset_ng/patch_extractor/image_stack/in_memory_image_stack.py

New features

Using dataset with/without input patch extractor
Extracting patches from 2+ target channels to create artificial input
Extracting these from different spatial locations
Mixing patches from different channels logic
Adding LC patches

Additional Notes and Examples

Uses dataset class proposed by Vera. This is a rough example rather than a ready prototype. Many things will probably change/move before this can be used.

Please ensure your PR meets the following requirements:

Code builds and passes tests locally, including doctests
New tests have been added (for bug fixes/features)
Pre-commit passes
PR to the documentation exists (for bug fixes / features)

…at a time

…to iz/feat/ms_nextgen_ds

jdeschamps · 2025-03-24T10:54:42Z

There are a lot of conflicts since last week's PR, in order to not review the PR too many times, I'd suggest fixing these before and implementing already what @melisande-c pointed out: moving logic non related to image loading out of ImageStack.

Can we have a summary of what you tried and didn't work? We can then discuss it, that would also avoid sparking the same questions as before.

That would make it easier to review!

Thanks!

melisande-c · 2025-03-26T08:41:48Z

So yeah I see what the problem is, the MicroSplit data extraction needs 3 levels of processing to:

extract the data
construct the LC input for each patch
combine data from different channels into one patch

We have 3 levels of processing, those being the CAREamicsDataset, the PatchExtractor and the ImageStack, but since here you've tried to keep the dataset class clear of MicroSplit functionality and put the channel-patch combination logic into the PatchExtractor, it leaves no choice but to combine the data-extraction and LC-construction steps into the lowest level in the ImageStack.

It's not very desirable to put the LC logic in the ImageStack class because it means the class no longer has a single responsibility and it has too much knowledge of higher level functionality. The result of this is that every time we write a new ImageStack class we have to re-write the LC logic, and this is not very friendly for people who want to write their own ImageStack classes.

The more I look at this the more it is clear that there should just be a separate MicroSplitDataset class, that can still reuse all the components of the NG-Dataset. I think MicroSplit has too much hard to reconcile functionality to try and combine the datasets, and it will be easy to have a code path that selects the correct dataset if it is a MicroSplit algorithm. This won't lead to a Dataset class explosion like we had before because the same dataset should still be used for training, validation and prediction.

The MicroSplitDataset could handle the channel-patch combination, and an LCPatchExtractor could handle LC construction. The MicroSplitDataset could have N PatchingStrategy instances for N channels. We could add a extract_channel_patch to all the ImageStacks and the PatchExtractor, which would take an additional channel_idx argument. This way, in the dataset __getitem__, each channel-patch would be extracted separately with a different PatchSpec for each channel, combined and returned. There would be no need to make any modifications to the existing PatchingStrategy classes, and we would only have to extend the interface of the ImageStacks and the PatchExtractor.

veegalinova and others added 24 commits February 21, 2025 17:31

add SequentialPatchSpecsGenerator prototype; add patch generator factory

6618141

change patch_extractor_factory to only return one optional extractor …

4af2f41

…at a time

swap init and generate parameters

3fd3793

add shape to patch_extractor, fix parameters

bdc7e4f

change create_patch_extractor to return only one extractor, add types

4d14651

add dtype to image_stack

32d5296

fix inits

dc0f07e

add dataset prototype

84eaf8e

add dataset demo

5c019df

renaming

aa25c41

add read function for custom

a220986

update demo

5169905

wip

c6ce16e

fix collate_fn by converting dtype to str

d23d225

format notebook

7e51d56

Merge remote-tracking branch 'origin/vg/feat/ng_dataset_prototype' in…

aa34f72

…to iz/feat/ms_nextgen_ds

Merge remote-tracking branch 'origin/main' into iz/feat/ms_nextgen_ds

31b7df0

config + patch_spec

6e6fd89

patch extraction sep channels

197eff7

rn param

3241992

rm ol file

cdb5bc0

mixing func

557a7af

lc logic wip

65d3159

cleanup

afa5b58

CatEek marked this pull request as draft March 13, 2025 16:20

CatEek requested review from jdeschamps, melisande-c and veegalinova and removed request for melisande-c March 13, 2025 16:20

CatEek and others added 9 commits March 28, 2025 23:07

Merge remote-tracking branch 'origin/main' into iz/feat/ms_nextgen_ds

42185e7

style(pre-commit.ci): auto fixes [...]

c41bdcd

merge

f5b4214

merge

eeea263

style(pre-commit.ci): auto fixes [...]

37e6d94

wip

5abeb1b

merge

d9f017c

wip

5aafe5b

style(pre-commit.ci): auto fixes [...]

adf434d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MicroSplit inputs generation MWE#440

MicroSplit inputs generation MWE#440
CatEek wants to merge 33 commits intomainfrom
iz/feat/ms_nextgen_ds

CatEek commented Mar 13, 2025 •

edited

Loading

Uh oh!

jdeschamps commented Mar 24, 2025

Uh oh!

melisande-c commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

CatEek commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Background - why do we need this PR?

Overview - what changed?

New files

New features

Additional Notes and Examples

Uses dataset class proposed by Vera. This is a rough example rather than a ready prototype. Many things will probably change/move before this can be used.

Uh oh!

jdeschamps commented Mar 24, 2025

Uh oh!

melisande-c commented Mar 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CatEek commented Mar 13, 2025 •

edited

Loading