Update BMZ README (#322)

### Description Following #278, this PR reorganizes the exported BMZ README: - Put all configuration to the end - Fix title level of algorithm - Put data description first - Remove training section - Add a validation section - Fix link to documentation In addition, the API for the BMZ export has changed: - `data_description` is now mandatory - `model_version` is an optional parameter allowing to version models - `cover` is an optional parameter Users can now provide a path to a cover, no validation is done of that path. If no cover is provided, then we do something that is probably not optimized for multichannels: - For 2D images, the images (input and output) are split horizontally, or vertically if they have different height. If they have different width as well, then an error is raised. - For Z stack, the middle slice is selected. - For images with 2 channels, the channels are used as `blue` and `green` - For images with 3 channels, they are interpreted as RGB - For images with 4 or more channels, the first 4 channels are used. The pixel value are normalised and multiplied with the RGB vectors of 4 pre-defined colors, and all channels are summed into an RGB image. Note that that this is not a great way to represent scientific data, we should apply LUT to the grey scale channels and recompose overlays. Look up tables have been worked out for scientific figures and we could use those. If that is acceptable for now, it will be easy to replace it. ### Changes Made - **Added**: - `cover_factory.py` - helper scripts to inspect BMZ README and covers. - **Modified**: all BMZ related modules. ### Related Issues #278 #176 ### Breaking changes Any call to `careamist.export_bmz`. ### Additional Notes and Examples The results README looks like this: ```markdown # Noise2Void - CAREamics ## Data description Mydata ## Algorithm description: Noise2Void is a UNet-based self-supervised algorithm that uses blind-spot training to denoise images. In short, in every patches during training, random pixels are selected and their value replaced by a neighboring pixel value. The network is then trained to predict the original pixel value. The algorithm relies on the continuity of the signal (neighboring pixels have similar values) and the pixel-wise independence of the noise (the noise in a pixel is not correlated with the noise in neighboring pixels). ## Configuration Noise2Void was trained using CAREamics (version 0.1.0) using the following configuration: algorithm_config: algorithm: n2v loss: n2v lr_scheduler: name: ReduceLROnPlateau parameters: {} model: architecture: UNet conv_dims: 2 depth: 2 final_activation: None in_channels: 1 independent_channels: true n2v2: false num_channels_init: 32 num_classes: 1 optimizer: name: Adam parameters: lr: 0.0001 data_config: axes: YX batch_size: 2 data_type: array patch_size: - 64 - 64 transforms: - flip_x: true flip_y: true name: XYFlip p: 0.5 - name: XYRandomRotate90 p: 0.5 - masked_pixel_percentage: 0.2 name: N2VManipulate roi_size: 11 strategy: uniform struct_mask_axis: none struct_mask_span: 5 experiment_name: export_bmz_readme training_config: accumulate_grad_batches: 1 check_val_every_n_epoch: 1 checkpoint_callback: auto_insert_metric_name: false mode: min monitor: val_loss save_last: true save_top_k: 3 save_weights_only: false verbose: false enable_progress_bar: true gradient_clip_algorithm: norm max_steps: -1 num_epochs: 10 precision: '32' version: 0.1.0 ## Validation In order to validate the model, we encourage users to acquire a test dataset with ground-truth data. Additionally, inspecting the residual image (difference between input and predicted image) can be helpful to identify whether real signal is removed from the input image. ## References Krull, A., Buchholz, T.O. and Jug, F., 2019. "Noise2Void - Learning denoising from single noisy images". In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2129-2137). doi: 10.1109/cvpr.2019.00223 ## Links - [CAREamics repository](https://github.com/CAREamics/careamics) - [CAREamics documentation](https://careamics.github.io/) ``` --- **Please ensure your PR meets the following requirements:** - [x] Code builds and passes tests locally, including doctests - [x] New tests have been added (for bug fixes/features) - [x] Pre-commit passes - [ ] PR to the documentation exists (for bug fixes / features)
CAREamics · Dec 13, 2024 · 66e34f7 · 66e34f7
1 parent 6eb3627
commit 66e34f7
Show file tree

Hide file tree

Showing 12 changed files with 296 additions and 52 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -17,7 +17,7 @@ repos:
     rev: v0.7.2
     hooks:
       - id: ruff
-        exclude: "^src/careamics/lvae_training/.*|^src/careamics/models/lvae/.*"
+        exclude: "^src/careamics/lvae_training/.*|^src/careamics/models/lvae/.*|^scripts/.*"
         args: [--fix, --target-version, py38]
 
   - repo: https://github.com/psf/black
@@ -31,7 +31,7 @@ repos:
       - id: mypy
         files: "^src/"
         exclude: "^src/careamics/lvae_training/.*|^src/careamics/models/lvae/.*|^src/careamics/config/likelihood_model.py|^src/careamics/losses/loss_factory.py|^src/careamics/losses/lvae/losses.py"
-        args: ['--config-file', 'mypy.ini']
+        args: ["--config-file", "mypy.ini"]
         additional_dependencies:
           - numpy
           - types-PyYAML
@@ -42,7 +42,7 @@ repos:
     rev: v1.8.0
     hooks:
       - id: numpydoc-validation
-        exclude: "^src/careamics/lvae_training/.*|^src/careamics/models/lvae/.*|^src/careamics/losses/lvae/.*"      
+        exclude: "^src/careamics/lvae_training/.*|^src/careamics/models/lvae/.*|^src/careamics/losses/lvae/.*|^scripts/.*"
 
   # # jupyter linting and formatting
   # - repo: https://github.com/nbQA-dev/nbQA

diff --git a/pyproject.toml b/pyproject.toml
@@ -54,6 +54,7 @@ dependencies = [
     'typer==0.12.3',
     'scikit-image<=0.23.2',
     'zarr<3.0.0',
+    'pillow<=10.3.0',
 ]
 
 [project.optional-dependencies]

diff --git a/scripts/export_bmz_readme.py b/scripts/export_bmz_readme.py
@@ -0,0 +1,29 @@
+#!/usr/bin/env python
+"""Export a README file for the bioimage model zoo."""
+from pathlib import Path
+
+from careamics.config import create_n2v_configuration
+from careamics.model_io.bioimage._readme_factory import readme_factory
+
+
+def main():
+    # create configuration
+    config = create_n2v_configuration(
+        experiment_name="export_bmz_readme",
+        data_type="array",
+        axes="YX",
+        patch_size=(64, 64),
+        batch_size=2,
+        num_epochs=10,
+    )
+    # export README
+    readme_path = readme_factory(
+        config=config, careamics_version="0.1.0", data_description="Mydata"
+    )
+
+    # copy file to __file__
+    readme_path.rename(Path(__file__).parent / "README.md")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/export_covers.py b/scripts/export_covers.py
diff --git a/src/careamics/careamist.py b/src/careamics/careamist.py
@@ -866,9 +866,11 @@ def export_to_bmz(
         friendly_model_name: str,
         input_array: NDArray,
         authors: list[dict],
-        general_description: str = "",
+        general_description: str,
+        data_description: str,
+        covers: Optional[list[Union[Path, str]]] = None,
         channel_names: Optional[list[str]] = None,
-        data_description: Optional[str] = None,
+        model_version: str = "0.1.0",
     ) -> None:
         """Export the model to the BioImage Model Zoo format.
 
@@ -898,11 +900,15 @@ def export_to_bmz(
         authors : list of dict
             List of authors of the model.
         general_description : str
-            General description of the model, used in the metadata of the BMZ archive.
-        channel_names : list of str, optional
-            Channel names, by default None.
-        data_description : str, optional
-            Description of the data, by default None.
+            General description of the model used in the BMZ metadata.
+        data_description : str
+            Description of the data the model was trained on.
+        covers : list of pathlib.Path or str, default=None
+            Paths to the cover images.
+        channel_names : list of str, default=None
+            Channel names.
+        model_version : str, default="0.1.0"
+            Version of the model.
         """
         # TODO: add in docs that it is expected that input_array dimensions match
         # those in data_config
@@ -921,11 +927,13 @@ def export_to_bmz(
             path_to_archive=path_to_archive,
             model_name=friendly_model_name,
             general_description=general_description,
+            data_description=data_description,
             authors=authors,
             input_array=input_array,
             output_array=output,
+            covers=covers,
             channel_names=channel_names,
-            data_description=data_description,
+            model_version=model_version,
         )
 
     def get_losses(self) -> dict[str, list]:

diff --git a/src/careamics/model_io/bioimage/_readme_factory.py b/src/careamics/model_io/bioimage/_readme_factory.py
@@ -1,7 +1,6 @@
 """Functions used to create a README.md file for BMZ export."""
 
 from pathlib import Path
-from typing import Optional
 
 import yaml
 
@@ -28,7 +27,7 @@ def _yaml_block(yaml_str: str) -> str:
 def readme_factory(
     config: Configuration,
     careamics_version: str,
-    data_description: Optional[str] = None,
+    data_description: str,
 ) -> Path:
     """Create a README file for the model.
 
@@ -41,18 +40,14 @@ def readme_factory(
         CAREamics configuration.
     careamics_version : str
         CAREamics version.
-    data_description : Optional[str], optional
-        Description of the data, by default None.
+    data_description : str
+        Description of the data.
 
     Returns
     -------
     Path
         Path to the README file.
     """
-    algorithm = config.algorithm_config
-    training = config.training_config
-    data = config.data_config
-
     # create file
     # TODO use tempfile as in the bmz_io module
     with cwd(get_careamics_home()):
@@ -65,42 +60,39 @@ def readme_factory(
 
         description = [f"# {algorithm_pretty_name}\n\n"]
 
+        # data description
+        description.append("## Data description\n\n")
+        description.append(data_description)
+        description.append("\n\n")
+
         # algorithm description
-        description.append("Algorithm description:\n\n")
+        description.append("## Algorithm description:\n\n")
         description.append(config.get_algorithm_description())
         description.append("\n\n")
 
-        # algorithm details
+        # configuration description
+        description.append("## Configuration\n\n")
+
         description.append(
             f"{algorithm_flavour} was trained using CAREamics (version "
-            f"{careamics_version}) with the following algorithm "
-            f"parameters:\n\n"
-        )
-        description.append(
-            _yaml_block(yaml.dump(algorithm.model_dump(exclude_none=True)))
+            f"{careamics_version}) using the following configuration:\n\n"
         )
-        description.append("\n\n")
-
-        # data description
-        description.append("## Data description\n\n")
-        if data_description is not None:
-            description.append(data_description)
-            description.append("\n\n")
-
-        description.append("The data was processed using the following parameters:\n\n")
 
-        description.append(_yaml_block(yaml.dump(data.model_dump(exclude_none=True))))
+        description.append(_yaml_block(yaml.dump(config.model_dump(exclude_none=True))))
         description.append("\n\n")
 
-        # training description
-        description.append("## Training description\n\n")
-
-        description.append("The model was trained using the following parameters:\n\n")
+        # validation
+        description.append("# Validation\n\n")
 
         description.append(
-            _yaml_block(yaml.dump(training.model_dump(exclude_none=True)))
+            "In order to validate the model, we encourage users to acquire a "
+            "test dataset with ground-truth data. Comparing the ground-truth data "
+            "with the prediction allows unbiased evaluation of the model performances. "
+            "This can be done for instance by using metrics such as PSNR, SSIM, or"
+            "MicroSSIM. In the absence of ground-truth, inspecting the residual image "
+            "(difference between input and predicted image) can be helpful to identify "
+            "whether real signal is removed from the input image.\n\n"
         )
-        description.append("\n\n")
 
         # references
         reference = config.get_algorithm_references()
@@ -111,9 +103,9 @@ def readme_factory(
 
         # links
         description.append(
-            "## Links\n\n"
+            "# Links\n\n"
             "- [CAREamics repository](https://github.com/CAREamics/careamics)\n"
-            "- [CAREamics documentation](https://careamics.github.io/latest/)\n"
+            "- [CAREamics documentation](https://careamics.github.io/)\n"
         )
 
         readme.write_text("".join(description))

diff --git a/src/careamics/model_io/bioimage/cover_factory.py b/src/careamics/model_io/bioimage/cover_factory.py
@@ -0,0 +1,171 @@
+"""Convenience function to create covers for the BMZ."""
+
+from pathlib import Path
+
+import numpy as np
+from numpy.typing import NDArray
+from PIL import Image
+
+color_palette = np.array(
+    [
+        np.array([255, 195, 0]),  # grey
+        np.array([189, 226, 240]),
+        np.array([96, 60, 76]),
+        np.array([193, 225, 193]),
+    ]
+)
+
+
+def _get_norm_slice(array: NDArray) -> NDArray:
+    """Get the normalized middle slice of a 4D or 5D array (SC(Z)YX).
+
+    Parameters
+    ----------
+    array : NDArray
+        Array from which to get the middle slice.
+
+    Returns
+    -------
+    NDArray
+        Normalized middle slice of the input array.
+    """
+    if array.ndim not in (4, 5):
+        raise ValueError("Array must be 4D or 5D.")
+
+    channels = array.shape[1] > 1
+    z_stack = array.ndim == 5
+
+    # get slice
+    if z_stack:
+        array_slice = array[0, :, array.shape[2] // 2, ...]
+    else:
+        array_slice = array[0, ...]
+
+    # channels
+    if channels:
+        array_slice = np.moveaxis(array_slice, 0, -1)
+    else:
+        array_slice = array_slice[0, ...]
+
+    # normalize
+    array_slice = (
+        255
+        * (array_slice - array_slice.min())
+        / (array_slice.max() - array_slice.min())
+    )
+
+    return array_slice.astype(np.uint8)
+
+
+def _four_channel_image(array: NDArray) -> Image:
+    """Convert 4-channel array to Image.
+
+    Parameters
+    ----------
+    array : NDArray
+        Normalized array to convert.
+
+    Returns
+    -------
+    Image
+        Converted array.
+    """
+    colors = color_palette[np.newaxis, np.newaxis, :, :]
+    four_c_array = np.sum(array[..., :4, np.newaxis] * colors, axis=-2).astype(np.uint8)
+
+    return Image.fromarray(four_c_array).convert("RGB")
+
+
+def _convert_to_image(original_shape: tuple[int, ...], array: NDArray) -> Image:
+    """Convert to Image.
+
+    Parameters
+    ----------
+    original_shape : tuple
+        Original shape of the array.
+    array : NDArray
+        Normalized array to convert.
+
+    Returns
+    -------
+    Image
+        Converted array.
+    """
+    n_channels = original_shape[1]
+
+    if n_channels > 1:
+        if n_channels == 3:
+            return Image.fromarray(array).convert("RGB")
+        elif n_channels == 2:
+            # add an empty channel to the numpy array
+            array = np.concatenate([np.zeros_like(array[..., 0:1]), array], axis=-1)
+
+            return Image.fromarray(array).convert("RGB")
+        else:  # more than 4
+            return _four_channel_image(array[..., :4])
+    else:
+        return Image.fromarray(array).convert("L").convert("RGB")
+
+
+def create_cover(directory: Path, array_in: NDArray, array_out: NDArray) -> Path:
+    """Create a cover image from input and output arrays.
+
+    Input and output arrays are expected to be SC(Z)YX. For images with a Z
+    dimension, the middle slice is taken.
+
+    Parameters
+    ----------
+    directory : Path
+        Directory in which to save the cover.
+    array_in : numpy.ndarray
+        Array from which to create the cover image.
+    array_out : numpy.ndarray
+        Array from which to create the cover image.
+
+    Returns
+    -------
+    Path
+        Path to the saved cover image.
+    """
+    # extract slice and normalize arrays
+    slice_in = _get_norm_slice(array_in)
+    slice_out = _get_norm_slice(array_out)
+
+    horizontal_split = slice_in.shape[-1] == slice_out.shape[-1]
+    if not horizontal_split:
+        if slice_in.shape[-2] != slice_out.shape[-2]:
+            raise ValueError("Input and output arrays have different shapes.")
+
+    # convert to Image
+    image_in = _convert_to_image(array_in.shape, slice_in)
+    image_out = _convert_to_image(array_out.shape, slice_out)
+
+    # split horizontally or vertically
+    if horizontal_split:
+        width = image_in.width // 2
+
+        cover = Image.new("RGB", (image_in.width, image_in.height))
+        cover.paste(image_in.crop((0, 0, width, image_in.height)), (0, 0))
+        cover.paste(
+            image_out.crop(
+                (image_in.width - width, 0, image_in.width, image_in.height)
+            ),
+            (width, 0),
+        )
+    else:
+        height = image_in.height // 2
+
+        cover = Image.new("RGB", (image_in.width, image_in.height))
+        cover.paste(image_in.crop((0, 0, image_in.width, height)), (0, 0))
+        cover.paste(
+            image_out.crop(
+                (0, image_in.height - height, image_in.width, image_in.height)
+            ),
+            (0, height),
+        )
+
+    # save
+    cover_path = directory / "cover.png"
+    cover.save(cover_path)
+
+    return cover_path