Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data augmentation #141

Merged
merged 18 commits into from
Jun 28, 2024
Merged

Data augmentation #141

merged 18 commits into from
Jun 28, 2024

Conversation

sfmig
Copy link
Collaborator

@sfmig sfmig commented Mar 27, 2024

Rebase after #203 is merged


This PR adds a few data augmentation transforms that we think could be helpful.

Specifically,

  • adds some preselected transforms with reasonable values to the config yaml file,
  • adds a CLI option --no_data_augmentation to skip all data augmentation during training,
  • adds a CLI option --log_data_augmentation to log the data augmentations linked to the datamodule as MLflow artefacts,
  • adds a notebook for visualisation,
  • adds data augmentation tests.

@codecov-commenter
Copy link

codecov-commenter commented Mar 27, 2024

Codecov Report

Attention: Patch coverage is 62.22222% with 17 lines in your changes missing coverage. Please review.

Project coverage is 37.91%. Comparing base (87babb5) to head (54e45b6).

Files Patch % Lines
crabs/detection_tracking/detection_utils.py 35.71% 9 Missing ⚠️
crabs/detection_tracking/train_model.py 20.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #141      +/-   ##
==========================================
+ Coverage   37.05%   37.91%   +0.85%     
==========================================
  Files          20       20              
  Lines        1414     1440      +26     
==========================================
+ Hits          524      546      +22     
- Misses        890      894       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sfmig sfmig force-pushed the smg/data-augm branch 3 times, most recently from a1c0a7c to 90da5cb Compare June 27, 2024 13:15
@sfmig sfmig marked this pull request as ready for review June 27, 2024 18:02
@sfmig sfmig requested a review from nikk-nikaznan June 27, 2024 18:02
Copy link
Collaborator

@nikk-nikaznan nikk-nikaznan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff! I am happy with this, run well and I tried to commented out some variable as well. The only thing, I guess we need to add some guide or in the readme for this. As we never test this properly, so not necessarily, the default one in the config will give the best results. Some might harm the model more. But this will be cool for anyone to start doing ablation study on data augmentation.

@nikk-nikaznan nikk-nikaznan mentioned this pull request Jun 28, 2024
@sfmig
Copy link
Collaborator Author

sfmig commented Jun 28, 2024

aah good call!
Yeah my idea was to run a study first, then get a rough estimate of what is the best performing set of parameters (rough estimate because I am not optimising the parameters per transform for example. Then have those as a default as you say.

But I think having a guide on how to run a study like this could be helpful - I opened an issue.

thanks Nik!

@sfmig sfmig merged commit 81db31e into main Jun 28, 2024
6 checks passed
@sfmig sfmig deleted the smg/data-augm branch June 28, 2024 09:46
sfmig added a commit that referenced this pull request Jul 8, 2024
* Move checkpoint type computation to utils

* Refactor checkpointing in training script

* Get ckpt type if ckpt is passed

* optionally apply a data augmentation method (WIP)

* fix config syntax in code

* add data augmentation notebook

* notebook to explore params of individual transformations

* add transforms from config

* Add keywords to datamodule params

* Optionally skip data augmentation

* If data augmentation key in config, apply

* Update tests

* Change tests to read default config

* Refactor transform functions and clean up

* update notebook

* Fix data augmentation default config

* Optionally log data augmentation transforms as artifacts

* Rename skip to 'no_data_augmentation'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants