-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in RGB augmentation ? #2
Comments
Nice catch! Looks like you are correct and the scaling used for training/inference indeed had the bug you point out. At least it was consistent over both. But I agree the default scaling would very likely improve things further and should be used in any future versions. In the visualizations the scaling is correct as they use ssl_legacysurvey/utils/to_rgb.py and not the bugged version in the decals ssl augmentations, but it seems when I vectorized that for loop over for img,band in zip(imgs, bands) I moved a parentheses where it shouldn't be... The pretrained models we release should keep this scaling, but anything new should be trained with the corrected one. Please make a new branch for the fix - I wouldn't want the change to be merged into master and then accidentally used in inference when loading in the previously trained models. |
Yeah exactly, so... maybe best to leave as-is because at least it's consistent. I find that the similarity searches are pretty 'similar' whether I use the fixed scaling or not, and that's probably thanks to the color jitter augmentations used during training. Maybe one way to fix it is that if someone retrains these models, they could introduce a new "ToRGBv2" augmentation that would fix the problem. Anyways, I mostly cared to know how to properly normalize the data :-) |
Trying to understand exactly what transformation to apply to some new raw data I want to get embeddings for, I noticed what I think might be a little bug here:
ssl-legacysurvey/ssl_legacysurvey/data_loaders/decals_augmentations.py
Line 406 in 277a492
This does not seem to match the d2_rgb formula which has for this line:
This translates into the following effect. Here is how a galaxy looks like in an imshow if I apply ToRGB to the raw images:
And here is what it looks like if I instead use the
dr2_rgb
function:It is less saturated and more details are visible.
I don't think it is a huge deal, but I would bet that the quality of the embeddings would actually be even better with this range compression formula.
If indeed this was unintended, I'm happy to make a PR/new branch.
The text was updated successfully, but these errors were encountered: