Skip to content

MicroSplit inputs generation MWE#440

Draft
CatEek wants to merge 33 commits intomainfrom
iz/feat/ms_nextgen_ds
Draft

MicroSplit inputs generation MWE#440
CatEek wants to merge 33 commits intomainfrom
iz/feat/ms_nextgen_ds

Conversation

@CatEek
Copy link
Copy Markdown
Contributor

@CatEek CatEek commented Mar 13, 2025

Description

Background - why do we need this PR?

Adds functionality needed for MicroSplit into the next-gen dataset.

Overview - what changed?

  • Adds pydantic config for PatchExtractor with corresponding changes to DataConfig
  • Changes PatchSpecsGenerator adding the logic to extract patches for different locations
  • Changes ImageStack class adding the logic for extracting patches for different locations for each channel, LC, etc.

New files

  • PatchExtractorConfig added to src/careamics/config/data/patch_extractor.py
  • _get_coords added to RandomPatchSpecsGenerator.generate in src/careamics/dataset_ng/patching_strategies/patch_specs_generator..py
  • _get_patches, _get_lc_patches and _composite_patch methods added in src/careamics/dataset_ng/patch_extractor/image_stack/in_memory_image_stack.py

New features

  • Using dataset with/without input patch extractor
  • Extracting patches from 2+ target channels to create artificial input
  • Extracting these from different spatial locations
  • Mixing patches from different channels logic
  • Adding LC patches

Additional Notes and Examples

Uses dataset class proposed by Vera. This is a rough example rather than a ready prototype. Many things will probably change/move before this can be used.

Please ensure your PR meets the following requirements:

  • Code builds and passes tests locally, including doctests
  • New tests have been added (for bug fixes/features)
  • Pre-commit passes
  • PR to the documentation exists (for bug fixes / features)

@CatEek CatEek marked this pull request as draft March 13, 2025 16:20
@CatEek CatEek requested review from jdeschamps, melisande-c and veegalinova and removed request for melisande-c March 13, 2025 16:20
@jdeschamps
Copy link
Copy Markdown
Member

There are a lot of conflicts since last week's PR, in order to not review the PR too many times, I'd suggest fixing these before and implementing already what @melisande-c pointed out: moving logic non related to image loading out of ImageStack.

Can we have a summary of what you tried and didn't work? We can then discuss it, that would also avoid sparking the same questions as before.

That would make it easier to review!

Thanks!

@melisande-c
Copy link
Copy Markdown
Member

So yeah I see what the problem is, the MicroSplit data extraction needs 3 levels of processing to:

  1. extract the data
  2. construct the LC input for each patch
  3. combine data from different channels into one patch

We have 3 levels of processing, those being the CAREamicsDataset, the PatchExtractor and the ImageStack, but since here you've tried to keep the dataset class clear of MicroSplit functionality and put the channel-patch combination logic into the PatchExtractor, it leaves no choice but to combine the data-extraction and LC-construction steps into the lowest level in the ImageStack.

It's not very desirable to put the LC logic in the ImageStack class because it means the class no longer has a single responsibility and it has too much knowledge of higher level functionality. The result of this is that every time we write a new ImageStack class we have to re-write the LC logic, and this is not very friendly for people who want to write their own ImageStack classes.

The more I look at this the more it is clear that there should just be a separate MicroSplitDataset class, that can still reuse all the components of the NG-Dataset. I think MicroSplit has too much hard to reconcile functionality to try and combine the datasets, and it will be easy to have a code path that selects the correct dataset if it is a MicroSplit algorithm. This won't lead to a Dataset class explosion like we had before because the same dataset should still be used for training, validation and prediction.

The MicroSplitDataset could handle the channel-patch combination, and an LCPatchExtractor could handle LC construction. The MicroSplitDataset could have N PatchingStrategy instances for N channels. We could add a extract_channel_patch to all the ImageStacks and the PatchExtractor, which would take an additional channel_idx argument. This way, in the dataset __getitem__, each channel-patch would be extracted separately with a different PatchSpec for each channel, combined and returned. There would be no need to make any modifications to the existing PatchingStrategy classes, and we would only have to extend the interface of the ImageStacks and the PatchExtractor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants