-
Notifications
You must be signed in to change notification settings - Fork 713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ch10 - can't run model.py - error with hypertune #166
Comments
Make sure you are pip installing to the right Python installation:
python3 -m pip install hypertune
and
python3 -m pip show hypertune
thanks
Lak
…On Tue, Feb 14, 2023 at 9:18 AM James ***@***.***> wrote:
Hello,
So on p.338 of the book it says:
[image: image]
<https://user-images.githubusercontent.com/8484188/218809208-d8b90fb4-36b0-41eb-bf9b-03187670438a.png>
But when I run this I get the following error:
Traceback (most recent call last):
File "/home/jgammerman/data-science-on-gcp/10_mlops/model.py", line 331, in <module>
train_and_evaluate(TRAIN_DATA_PATTERN, EVAL_DATA_PATTERN, TEST_DATA_PATTERN, OUTPUT_MODEL_DIR, OUTPUT_DIR)
File "/home/jgammerman/data-science-on-gcp/10_mlops/model.py", line 180, in train_and_evaluate
hpt = hypertune.HyperTune()
AttributeError: module 'hypertune' has no attribute 'HyperTune'
I have pip installed hypertune on my VM so I know it's there:
***@***.***:~/data-science-on-gcp/.......$ pip show hypertune
Name: hypertune
Version: 1.0.3
Summary: A library for performing hyperparameter optimization with Polyaxon.
Home-page: https://github.com/polyaxon/hypertune
Author: Polyaxon, Inc.
Author-email: ***@***.***
License: Apache 2.0
Location: /home/jgammerman/.local/lib/python3.9/site-packages
—
Reply to this email directly, view it on GitHub
<#166>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANJPZ6XEIUFDVLI6PVT3W3WXO44XANCNFSM6AAAAAAU32THIU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Tried that - same error as before |
ah. wrong hypertune! Please uninstall that one; this is the one you need
to pip install:
https://pypi.org/project/cloudml-hypertune/
thanks
Lak
…On Tue, Feb 14, 2023 at 9:36 AM James ***@***.***> wrote:
Tried that - same error as before
—
Reply to this email directly, view it on GitHub
<#166 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANJPZ7FTNM2PIJPMOVBLRLWXO7BVANCNFSM6AAAAAAU32THIU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
That made it work, thanks Lak. (By the way for anyone else at this stage, you might need to run I then tried running the pipeline train_on_vertexai.py and it spent about 10 mins training before failing due to a memory error:
I'm currently running the same pipeline using AutoML and it's been training for 2 hours so far - should it take that long? |
(1) Could you add the "export" instruction to the README instructions on
GitHub?
https://github.com/GoogleCloudPlatform/data-science-on-gcp/tree/main/10_mlops/README.md
(2) I believe that's a quota error you are getting. You don't have the
quota for one Nvidia T4 GPU. You may need to request that quota. If you
have a different GPU available, change this line appropriately:
accelerator_type=aip.AcceleratorType.NVIDIA_TESLA_T4.name,
accelerator_count=1,
(3) It should finish in a little over 2 hours:
budget_milli_node_hours=(300 if develop_mode else 2000),
…On Wed, Feb 15, 2023 at 8:53 AM James ***@***.***> wrote:
That made it work, thanks Lak.
(By the way for anyone else at this stage, you might need to run export
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python to get it to work.)
I then tried running the pipeline *train_on_vertexai.py* and it spent
about 10 mins training before failing due to a memory error:
RuntimeError: Training failed with: code: 8 message: "The following quota
metrics exceed quota limits:
aiplatform.googleapis.com/custom_model_training_nvidia_t4_gpus"
I'm currently running the same pipeline using AutoML and it's been
training for 2 hours so far - should it take that long?
—
Reply to this email directly, view it on GitHub
<#166 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANJPZ45NV7WIPKMV5NO2YTWXUC2BANCNFSM6AAAAAAU32THIU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I'm getting the same error when trying to run hyperparameter tuning:
|
Note the name of the quota that you need increased:
aiplatform.googleapis.com/custom_model_training_nvidia_t4_gpus
Please look in console.cloud.google.com/quotas for a quota with this name
thanks
Lak
…On Thu, Feb 16, 2023 at 7:22 AM James ***@***.***> wrote:
I'm getting the same error when trying to run hyperparameter tuning:
google.api_core.exceptions.ResourceExhausted: 429 The following quota
metrics exceed quota limits:
aiplatform.googleapis.com/custom_model_training_nvidia_t4_gpus
—
Reply to this email directly, view it on GitHub
<#166 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANJPZ7X3FF25KG2JRKIXODWXZA2TANCNFSM6AAAAAAU32THIU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Also, remember that if your quota is 1 GPU, and you have VM already created
with this GPU, you have exhausted the quota.
The Vertex AI managed service won't have a GPU available. Similarly, if
you are doing hyperparm training with 4 parallel workers, you need 4 GPUs
in your quota
On Thu, Feb 16, 2023 at 7:30 AM Lakshmanan Valliappa ***@***.***>
wrote:
… Note the name of the quota that you need increased:
aiplatform.googleapis.com/custom_model_training_nvidia_t4_gpus
Please look in console.cloud.google.com/quotas for a quota with this name
thanks
Lak
On Thu, Feb 16, 2023 at 7:22 AM James ***@***.***> wrote:
> I'm getting the same error when trying to run hyperparameter tuning:
>
> google.api_core.exceptions.ResourceExhausted: 429 The following quota
> metrics exceed quota limits:
> aiplatform.googleapis.com/custom_model_training_nvidia_t4_gpus
>
> —
> Reply to this email directly, view it on GitHub
> <#166 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AANJPZ7X3FF25KG2JRKIXODWXZA2TANCNFSM6AAAAAAU32THIU>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***
> com>
>
|
When navigating to console.cloud.google.com/quotas it says that I need to upgrade to a paid account: I'm still using a managed service (I tried creating a user-managed one a few days ago but it didn't work, something about not enough GPUs currently being available....I just put it down to bad timing and decided to try again later). I guess that's the root of the problem. Will try again. |
Hello,
So on p.338 of the book it says:
But when I run this I get the following error:
I have pip installed hypertune on my VM so I know it's there:
The text was updated successfully, but these errors were encountered: