-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No such file or directory: 'runs/detect/train/weights/best.pt' #485
Comments
👋 Hello @Fistcar, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:
If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix. If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response. We try to respond to all issues as promptly as possible. Thank you for your patience! |
@Fistcar hello! I'm sorry to hear you're encountering issues with your model training on the HUB. The error you're experiencing indicates the training script is unable to locate the 'best.pt' file which should contain the weights of your best-performing model. This file is typically saved automatically during the training process when a new best metric is achieved. If you cannot find the 'best.pt' file in the specified directory, it is possible that either the file was not created due to an interruption in the training process or it may have been inadvertently moved or deleted. Creating an empty 'best.pt' file is not a valid solution since the file needs to contain specific data serialized in a format that the PyTorch framework can understand. An empty file or a file with invalid contents will cause the 'Invalid magic number; corrupt file?' error you're seeing. Here's what you can do:
If you continue to have difficulties, you may refer to the HUB documentation to ensure your training setup and process are configured correctly. Training checkpoints and the way to continue training from them should also be covered in the documentation, which may help prevent the need to start over. I understand how frustrating it can be to encounter such issues, especially after a significant amount of training time. I hope this guidance is helpful, and we're here for any further assistance you may need. Good luck! 🤞 |
@Fistcar If you have been manually creating https://colab.research.google.com/drive/1vW8xNoNi89Y4yWratNVUpqPp3d-9bY45#scrollTo=-xtsX6NxdxHz |
@kalenmike Your notebook allowed the model to be used in the hub and on my phone. Thank you. |
@Fistcar This is a limitation we are trying to fix at the moment. The problem arises when training resumes on a fresh instance without ever achieving the best mAP again. As training completes without ever outperforming an epoch that completed before resuming and the environment no longer has the previous best epoch saved the final upload fails. This can be avoided by ensuring that your environment does not get reset during resume. |
@Fistcar Glad to hear. We are working on a way to avoid these issues but for the moment we can only fix them if they occur. |
my training completed but i cant find the best,pt file it says results saved to runs/detect/train my folders include runs/detect/predict which has output video files which I wish to retrain as the output has low quality output. |
Hello @Ray150789! It appears that your training process completed but the 1. Check Saved DirectoriesVerify the exact location of your training output. By default, results are saved to
2. Use
|
Search before asking
HUB Component
Training
Bug
I've been trying to train a model in the hub to run on my phone. The training got to epoch 97 out of 100 and simply errors out. I've tried using Firefox and Edge, but the same errors occur. The error was 'No such file or directory: 'runs/detect/train/weights/best.pt'' and I could find no way to move that file from the hub to google colab. I created an empty best.pt file and that just seemed to cause more errors as now I see 'raise RuntimeError("Invalid magic number; corrupt file?") EOFError: Ran out of input'. How can I fix this? I've wasted over 5 computer hours on google colab simply trying to finish the training of this network. I do not want to start training all over again. The only reason I am using the hub is to test the network out on my phone.
Environment
Ultralytics HUB Version
v0.1.31
Client User Agent
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0
Operating System
Win32
Browser Window Size
2000 x 1038
Server Timestamp
1701285411
Minimal Reproducible Example
Additional
No response
The text was updated successfully, but these errors were encountered: