Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem resuming training in Google Colab #671

Closed
1 task done
sebasmej opened this issue Apr 30, 2024 · 3 comments
Closed
1 task done

Problem resuming training in Google Colab #671

sebasmej opened this issue Apr 30, 2024 · 3 comments
Labels
bug Something isn't working duplicate This issue or pull request already exists

Comments

@sebasmej
Copy link

Search before asking

  • I have searched the HUB issues and found no similar bug report.

HUB Component

Training

Bug

I am training a model using google colab (it is not the first model I train in this way) and when I try to resume executing the commands:

%pip install ultralytics  # install
from ultralytics import YOLO, checks, hub
checks()  # checks

hub.login('my_API_KEY')
model = YOLO('my_MODEL_ID')
results = model.train()

the following error message appears:

requirements: Ultralytics requirement ['hub-sdk>=0.0.6'] not found, attempting AutoUpdate...
Collecting hub-sdk>=0.0.6
  Downloading hub_sdk-0.0.8-py3-none-any.whl (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/40.9 kB 2.4 MB/s eta 0:00:00
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from hub-sdk>=0.0.6) (2.31.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->hub-sdk>=0.0.6) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->hub-sdk>=0.0.6) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->hub-sdk>=0.0.6) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->hub-sdk>=0.0.6) (2024.2.2)
Installing collected packages: hub-sdk
Successfully installed hub-sdk-0.0.8

requirements: AutoUpdate success ✅ 6.0s, installed 1 package: ['hub-sdk>=0.0.6']
requirements: ⚠️ Restart runtime or rerun command for updates to take effect

Ultralytics HUB: New authentication successful ✅
Ultralytics HUB: View model at https://hub.ultralytics.com/models/6SUZnsAo0z0y6gld7lpp 🚀
Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/gR39oPibZKaU7n6mUI0WE1H1CQH2/models/6SUZnsAo0z0y6gld7lpp/epoch-32.pt to 'epoch-32.pt'...
⚠️ Download failure, retrying 1/3 https://storage.googleapis.com/ultralytics-hub.appspot.com/users/gR39oPibZKaU7n6mUI0WE1H1CQH2/models/6SUZnsAo0z0y6gld7lpp/epoch-32.pt?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=firebase-adminsdk-jsjt9%40ultralytics-hub.iam.gserviceaccount.com%2F20240430%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240430T070930Z&X-Goog-Expires=900&X-Goog-SignedHeaders=host&X-Goog-Signature=5bc8969abb8a1e1ee6b7518609a6a883f6276e7f9ee851ce01edf394f85b58a1e595a775690399a9f569b14cbfe7b3fd299049484b41a34e9cdc8002ed711d399bd0d2b61c01776b258a87ba3bf78b786a522e601f1413e508d8e3d61c6f0d89e76fe6cdc64a0b8e726cf24b0c701c9a6a679cce954bd385cd4714d92ba336c9bb6faea48f3bcb3eecfecdaa7e1fb7b4316bc34d042a31c79f79c4ea764d54e3632132246cbe6e9f37d494f87f9361d0251673517fbd03a6522650f9c3cfedfaf96526ef8f4a64a1da97e8d7904493489c484e339b72390012ad33d5faf7c172e810364057072fd535abb3f4368e96b0a8aa132eb2402b5b93959369719b5e59...
---------------------------------------------------------------------------
UnpicklingError                           Traceback (most recent call last)
[<ipython-input-2-600de03de6f2>](https://localhost:8080/#) in <cell line: 3>()
      1 hub.login('67f19bbd86bcc04db7747d501c4e11246ac092e81a')
      2 
----> 3 model = YOLO('https://hub.ultralytics.com/models/6SUZnsAo0z0y6gld7lpp')
      4 results = model.train()

6 frames
[/usr/local/lib/python3.10/dist-packages/ultralytics/models/yolo/model.py](https://localhost:8080/#) in __init__(self, model, task, verbose)
     21         else:
     22             # Continue with default YOLO initialization
---> 23             super().__init__(model=model, task=task, verbose=verbose)
     24 
     25     @property

[/usr/local/lib/python3.10/dist-packages/ultralytics/engine/model.py](https://localhost:8080/#) in __init__(self, model, task, verbose)
    149             self._new(model, task=task, verbose=verbose)
    150         else:
--> 151             self._load(model, task=task)
    152 
    153     def __call__(

[/usr/local/lib/python3.10/dist-packages/ultralytics/engine/model.py](https://localhost:8080/#) in _load(self, weights, task)
    238 
    239         if Path(weights).suffix == ".pt":
--> 240             self.model, self.ckpt = attempt_load_one_weight(weights)
    241             self.task = self.model.args["task"]
    242             self.overrides = self.model.args = self._reset_ckpt_args(self.model.args)

[/usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py](https://localhost:8080/#) in attempt_load_one_weight(weight, device, inplace, fuse)
    804 def attempt_load_one_weight(weight, device=None, inplace=True, fuse=False):
    805     """Loads a single model weights."""
--> 806     ckpt, weight = torch_safe_load(weight)  # load ckpt
    807     args = {**DEFAULT_CFG_DICT, **(ckpt.get("train_args", {}))}  # combine model and default args, preferring model args
    808     model = (ckpt.get("ema") or ckpt["model"]).to(device).float()  # FP32 model

[/usr/local/lib/python3.10/dist-packages/ultralytics/nn/tasks.py](https://localhost:8080/#) in torch_safe_load(weight)
    730             }
    731         ):  # for legacy 8.0 Classify and Pose models
--> 732             ckpt = torch.load(file, map_location="cpu")
    733 
    734     except ModuleNotFoundError as e:  # e.name is missing module name

[/usr/local/lib/python3.10/dist-packages/torch/serialization.py](https://localhost:8080/#) in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
   1038             except RuntimeError as e:
   1039                 raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
-> 1040         return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
   1041 
   1042 

[/usr/local/lib/python3.10/dist-packages/torch/serialization.py](https://localhost:8080/#) in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
   1256             "functionality.")
   1257 
-> 1258     magic_number = pickle_module.load(f, **pickle_load_args)
   1259     if magic_number != MAGIC_NUMBER:
   1260         raise RuntimeError("Invalid magic number; corrupt file?")

UnpicklingError: invalid load key, '<'.

Environment

Google Colab

Minimal Reproducible Example

  1. Login to hub
  2. Search the model to train
  3. Click to copy the Colab code
  4. Follow the steps on the Google Colab notebook
  5. Error appears

Additional

No response

@sebasmej sebasmej added the bug Something isn't working label Apr 30, 2024
Copy link

👋 Hello @sebasmej, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

  • Quickstart. Start training and deploying YOLO models with HUB in seconds.
  • Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
  • Projects: Creating and Managing. Group your models into projects for improved organization.
  • Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
  • Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
  • Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
    • iOS. Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads.
    • Android. Explore TFLite acceleration on mobile devices.
  • Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

@pderrenger
Copy link
Member

Hello! It seems like there was an issue downloading your model weights from the server, which led to a corrupted file. This can happen due to network connectivity problems or server-side issues occasionally.

Here's a quick checklist to try and resolve this problem:

  1. Rerun the Training Cell: Sometimes, simply rerunning the command can resolve the issue as it might have been a temporary connectivity problem.
  2. Check Internet Connection: Ensure your Colab notebook has a stable internet connection. Changing network environments can sometimes help.
  3. Clear Colab Environment: Restart your Colab runtime and clear any cached data. It's also good practice to delete any corrupted weight files if they've been downloaded.

Should the issue persist after these steps, please open a new issue with details of the error after rerun for further investigation. Sometimes, certain issues might be tied to transient conditions on the server or network, and providing fresh context helps us identify if there's a new problem.

Thank you for reaching out! Your contributions help the community and the development of our platform. 🚀

@sergiuwaxmann sergiuwaxmann added the duplicate This issue or pull request already exists label May 6, 2024
@sergiuwaxmann
Copy link
Member

Closing this issue as it is duplicated by #674.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

3 participants