-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model training not getting completed/ Disconnected. Stuck at 100% #952
Comments
👋 Hello @Sudhir1609, thank you for reporting an issue about Ultralytics HUB 🚀! Please check out our HUB Docs for more information:
It looks like you've reported a 🐛 bug where the model gets stuck at 100% completion and doesn't finalize. To help us investigate and resolve this, could you please provide a minimum reproducible example (MRE)? This includes:
For guidance on creating an MRE, visit our Minimum Reproducible Example guide. 🛠️ An Ultralytics engineer will also review your issue and assist you shortly. Thank you for bringing this to our attention and for your patience! 😊 |
@Sudhir1609 Hello! |
@sergiuwaxmann is this the correct URL ? |
@Sudhir1609 Yes, this URL points to your model. I can see your model is disconnected and the last epoch is |
I've tried 'Resume Training' like 5-6 times now and everytime it gets disconnected around the same epoch, I'm worried losing my funds too. |
Thank you for sharing the update, @Sudhir1609! I understand how frustrating this must be, especially with the concern about funds. To address this, please try the following steps:
If the issue persists and you've already tried the above steps, please let us know. You can also share with us any specific error messages or logs that appear before the disconnection. We'll investigate further to ensure this gets resolved for you. Thank you for your patience! 😊 |
Im not able to change any configuration, Only the Resume training option is enabled and to change the Instance. How can i reduce the epoch size or tweak my dataset settings ? @pderrenger |
@Sudhir1609 Unfortunately, the number of epochs can't be changed after the model started training. |
@sergiuwaxmann Thanks, I was facing the same problem and tried the same steps for this model too. Thanks for you help ! |
@Sudhir1609 You should have your account balance back. |
I believe the size of your dataset causes OOM issue but we are still investigating this. |
@sergiuwaxmann I tried changing the instance between Thanks for the update. I'll try to change my dataset and try again |
@sergiuwaxmann I changed the dataset size and tried training the model and faced with the same problem Can you please let me know about this |
@Sudhir1609 Thank you for your patience as we continue to investigate this issue. We're currently working to identify the root cause, but reproducing the problem has been challenging due to the large size of the dataset involved. Please rest assured that we're actively working on this and will keep you updated as soon as we have more information. Apologies for the inconvenience, and thank you for your understanding! 🙏 |
Thank you for your patience and understanding as we looked into this issue. We have successfully reproduced the issue on our end and identified the root cause. The development team has been informed and is actively working on a fix. We appreciate your cooperation and will update you as soon as the fix is deployed. Thank you! 😊 |
@yogendrasinghx |
Search before asking
HUB Component
Models
Bug
Its constantly getting stuck at 100% and not getting completed.
Environment
Ultralytics HUB Version
v0.1.79
Client User Agent
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36
Operating System
Linux x86_64
Browser Window Size
1848 x 932
Server Timestamp
1734061093
Minimal Reproducible Example
No response
Additional
No response
The text was updated successfully, but these errors were encountered: