Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intensity normalizations #981

Merged
merged 6 commits into from
Sep 10, 2024

Conversation

qin-yu
Copy link
Contributor

@qin-yu qin-yu commented Jul 25, 2024

Hi @carsen-stringer!

During the process of packaging the custom Cellpose model for bioimage.io, I identified two potential issues with the intensity normalization implementation in Cellpose. This PR proposes fixes for:

  1. The current normalization with the lowhigh argument is not channel-wise. In this PR, lowhigh now either takes a 2-tuple (low, high) for all channels, or a nchan-tuple of that for each channel.
  2. The current invert mechanism does not necessarily invert the channel to be segmented (--chan); instead, it inverts the last channel. Depending on the --chan2 value, this might not be the intended channel. In this PR, all channels are inverted if invert.
Original PR message on 25 Jul 2024

Original PR

Hi @carsen-stringer!

During the process of packaging the custom Cellpose model for bioimage.io, I identified two potential issues with the intensity normalization implementation in Cellpose. This PR proposes fixes for:

  1. The current min-max normalization with the lowhigh argument is not channel-wise, and the argument itself might be misused.
  2. The current invert mechanism does not necessarily invert the channel to be segmented (--chan); instead, it inverts the last channel. Depending on the --chan2 value, this might not be the intended channel.

Please review these changes as I might have misunderstood some parts.


Fix for Min-Max Normalization by 31bc5be

For cellpose.transforms.normalize_img() (channel-wise normalization), the lowhigh option is provided. I assume this corresponds to min-max normalization where lowhigh is a tuple representing (new minimum, new maximum).

lowhigh (tuple, optional): The lower and upper bounds for normalization. If provided, it should be a tuple
of two values. Defaults to None.

If lowhigh represents (old minimum, old maximum), then normalization is not channel-wise if lowhigh has a length of 2:

if lowhigh is not None:
assert len(lowhigh) == 2
assert lowhigh[1] > lowhigh[0]

In either case, the implementation needs fixing:

if lowhigh is not None:
for c in range(nchan):
img_norm[...,
c] = (img_norm[..., c] - lowhigh[0]) / (lowhigh[1] - lowhigh[0])

The current implementation is essentially: img_norm = (img_norm - lowhigh[0]) / (lowhigh[1] - lowhigh[0])

This has been fixed by commit 31bc5be.


Fix for Invert by 91469d1

For cellpose.transforms.normalize_img() (channel-wise normalization), there is an invert option for cases where "cells are dark instead of bright":

invert (bool, optional): Whether to invert the image. Useful if cells are dark instead of bright.
Defaults to False.

In Cellpose, the invert function is performed after reshape(), which moves the main channel to channel 0 and the auxiliary channel to channel 1. Therefore, the primary use case is to invert the first channel (channel 0) instead of the last one (current implementation):

if (tile_norm_blocksize > 0 or normalize) and invert:
img_norm[..., c] = -1 * img_norm[..., c] + 1

The variable c retains the value from a previous loop, being 1 if nchan is 2 or 2 if nchan is 3.

This has been fixed by commit 91469d1, but further improvements can be made to allow for a user-specified list of channels to be inverted.

qin-yu added 2 commits July 24, 2024 01:47
1. Make it actually channel-wise;
2. Before it should take `(old_min, old_max)` of the input image (otherwise it doesn't make sense), but now it takes `(new_min, new_max)` of the expected normalized output.

Before this fix, if `lowhigh` is provided, channel-wise linear normalization is performed by:

```python
for c in ...:
    img_norm[...,c] = (img_norm[..., c] - lowhigh[0]) / (lowhigh[1] - lowhigh[0])
```

which has a redundant `[..., c]` and is not "channel-wise", i.e. it is equivalent to

```python
img_norm = (img_norm - lowhigh[0]) / (lowhigh[1] - lowhigh[0])
```

This parameter, `lowhigh`, is never used in Cellpose itself, however.
Previously, only the last channel was inverted. Now, channel 0 is inverted, which is typically cells used for segmentation. This update addresses the common use case where the biological structure of interest is in the first channel. Future improvements could allow for user-specified channels.
@carsen-stringer
Copy link
Member

thanks @qin-yu ! the lowhigh is meant to represent the values to set to 0 and 1 (e.g. pixel value 100->0 and 1100->1 corresponds to lowhigh = [100,1100]. I think the fix here would be to check if lowhigh is a list of lists (per channel) or a single list (for all channels). It's a good point that this use-case isn't covered currently.

for the inversion, thanks for catching that, I think it would make sense to apply for all channels?

Copy link

codecov bot commented Sep 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.70%. Comparing base (19b65ce) to head (4df2731).
Report is 14 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #981      +/-   ##
==========================================
+ Coverage   52.76%   53.70%   +0.93%     
==========================================
  Files          18       18              
  Lines        4126     4132       +6     
==========================================
+ Hits         2177     2219      +42     
+ Misses       1949     1913      -36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@qin-yu
Copy link
Contributor Author

qin-yu commented Sep 9, 2024

thanks @qin-yu ! the lowhigh is meant to represent the values to set to 0 and 1 (e.g. pixel value 100->0 and 1100->1 corresponds to lowhigh = [100,1100]. I think the fix here would be to check if lowhigh is a list of lists (per channel) or a single list (for all channels). It's a good point that this use-case isn't covered currently.

Thanks for explaining! That makes sense, working on it.

for the inversion, thanks for catching that, I think it would make sense to apply for all channels?

Yes you are right, inverting all channels by default if invert == True seems natural to me. But maybe this could also be either a bool or tuple(bool, ...)?

@carsen-stringer
Copy link
Member

Thanks so much, regarding the invert function I think it's fine with all channels, I'm not sure how much it's being used with multichannel images

* `lowhigh` could be either a 2-tuple or a `nchan`-tuple of 2-tuple
* `invert` inverts all channels

Note that `lowhigh` shouldn't be combined with pre-smoothing or -sharpening.
* Tested the changes made in transform.py
* Removed unused import and formatted whitespace
@qin-yu
Copy link
Contributor Author

qin-yu commented Sep 10, 2024

Thanks so much, regarding the invert function I think it's fine with all channels, I'm not sure how much it's being used with multichannel images

You are right. Let's keep it simple

I have finalized the PR, including two fixes along with the corresponding tests. Please take a look when you have a moment! @carsen-stringer

@qin-yu
Copy link
Contributor Author

qin-yu commented Sep 10, 2024

The failed check seems to be incidental and maybe unrelated to the PR.

@qin-yu
Copy link
Contributor Author

qin-yu commented Sep 10, 2024

All checks passed 🥳

@carsen-stringer carsen-stringer merged commit 385891c into MouseLand:main Sep 10, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants