Bug in RGB augmentation ? #2

EiffL · 2023-09-28T01:15:18Z

Trying to understand exactly what transformation to apply to some new raw data I want to get embeddings for, I noticed what I think might be a little bug here:

ssl-legacysurvey/ssl_legacysurvey/data_loaders/decals_augmentations.py

Line 406 in 277a492

    
           image = image * (self.scales + self.m * (fI / I)[..., np.newaxis]).astype(np.float32)

This does not seem to match the d2_rgb formula which has for this line:

rgb[:,:,plane] = (img * scale + m) * fI / I

This translates into the following effect. Here is how a galaxy looks like in an imshow if I apply ToRGB to the raw images:

And here is what it looks like if I instead use the dr2_rgb function:

It is less saturated and more details are visible.

I don't think it is a huge deal, but I would bet that the quality of the embeddings would actually be even better with this range compression formula.

If indeed this was unintended, I'm happy to make a PR/new branch.

The text was updated successfully, but these errors were encountered:

georgestein · 2023-09-28T02:50:02Z

Nice catch!

Looks like you are correct and the scaling used for training/inference indeed had the bug you point out. At least it was consistent over both. But I agree the default scaling would very likely improve things further and should be used in any future versions.

In the visualizations the scaling is correct as they use ssl_legacysurvey/utils/to_rgb.py and not the bugged version in the decals ssl augmentations, but it seems when I vectorized that for loop over for img,band in zip(imgs, bands) I moved a parentheses where it shouldn't be...

The pretrained models we release should keep this scaling, but anything new should be trained with the corrected one. Please make a new branch for the fix - I wouldn't want the change to be merged into master and then accidentally used in inference when loading in the previously trained models.

EiffL · 2023-09-28T16:40:15Z

Yeah exactly, so... maybe best to leave as-is because at least it's consistent. I find that the similarity searches are pretty 'similar' whether I use the fixed scaling or not, and that's probably thanks to the color jitter augmentations used during training.

Maybe one way to fix it is that if someone retrains these models, they could introduce a new "ToRGBv2" augmentation that would fix the problem.

Anyways, I mostly cared to know how to properly normalize the data :-)

EiffL changed the title ~~Bug in RGB augmentation~~ Bug in RGB augmentation ? Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in RGB augmentation ? #2

Bug in RGB augmentation ? #2

EiffL commented Sep 28, 2023

georgestein commented Sep 28, 2023

EiffL commented Sep 28, 2023

Bug in RGB augmentation ? #2

Bug in RGB augmentation ? #2

Comments

EiffL commented Sep 28, 2023

georgestein commented Sep 28, 2023

EiffL commented Sep 28, 2023