-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on Kubric Dataset #35
Comments
Apologies for the slow response; it's likely that this is just compilation time (the training graph is complex and the JAX GPU compiler is slow; it might take hours to compile), but it's somewhat time-consuming for us to debug this so we haven't dug into it yet. Hopefully we will find time to do so soon. |
@yangyi02 Thank you for your response. I did enable GPU and the model was built on GPU as well, however the execution stops midway and training doesn't take place. |
@TahaRazzaq From the screenshot, I don't see the training stops. Could you verify if the training message just hang there (if hanging there, could you just wait for i.e. 1 hour?), or indeed completely stoped? You can adjust the batch_dim in tapir_config.py to 1 to see if it gives you slightly faster verification. |
@yangyi02 The execution stops since I'm able to run other cells. Even with batch_dim set to 1, within 3 - 5 mins the execution stops. The last message printed is |
I am trying to train the TAPIR model on the Kubric Dataset using Google Colab however my code keeps stopping without any errors. I am using the
python ./experiment.py --config ./configs/tapir_config.py
command and the config file is loaded successfully. The training process stops abruptly without any errors. I am unable to determine the cause and would be really grateful for any help in this regards.Thank You!
The text was updated successfully, but these errors were encountered: