Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in RGB augmentation ? #2

Open
EiffL opened this issue Sep 28, 2023 · 2 comments
Open

Bug in RGB augmentation ? #2

EiffL opened this issue Sep 28, 2023 · 2 comments

Comments

@EiffL
Copy link

EiffL commented Sep 28, 2023

Trying to understand exactly what transformation to apply to some new raw data I want to get embeddings for, I noticed what I think might be a little bug here:

image = image * (self.scales + self.m * (fI / I)[..., np.newaxis]).astype(np.float32)

This does not seem to match the d2_rgb formula which has for this line:

rgb[:,:,plane] = (img * scale + m) * fI / I

This translates into the following effect. Here is how a galaxy looks like in an imshow if I apply ToRGB to the raw images:
image
And here is what it looks like if I instead use the dr2_rgb function:
image

It is less saturated and more details are visible.

I don't think it is a huge deal, but I would bet that the quality of the embeddings would actually be even better with this range compression formula.

If indeed this was unintended, I'm happy to make a PR/new branch.

@EiffL EiffL changed the title Bug in RGB augmentation Bug in RGB augmentation ? Sep 28, 2023
@georgestein
Copy link
Owner

Nice catch!

Looks like you are correct and the scaling used for training/inference indeed had the bug you point out. At least it was consistent over both. But I agree the default scaling would very likely improve things further and should be used in any future versions.

In the visualizations the scaling is correct as they use ssl_legacysurvey/utils/to_rgb.py and not the bugged version in the decals ssl augmentations, but it seems when I vectorized that for loop over for img,band in zip(imgs, bands) I moved a parentheses where it shouldn't be...

The pretrained models we release should keep this scaling, but anything new should be trained with the corrected one. Please make a new branch for the fix - I wouldn't want the change to be merged into master and then accidentally used in inference when loading in the previously trained models.

@EiffL
Copy link
Author

EiffL commented Sep 28, 2023

Yeah exactly, so... maybe best to leave as-is because at least it's consistent. I find that the similarity searches are pretty 'similar' whether I use the fixed scaling or not, and that's probably thanks to the color jitter augmentations used during training.

Maybe one way to fix it is that if someone retrains these models, they could introduce a new "ToRGBv2" augmentation that would fix the problem.

Anyways, I mostly cared to know how to properly normalize the data :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants