Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Better error message when usage limit arrived. (HTTPTooManyRequests) #1893

Open
orena1 opened this issue Dec 23, 2024 · 3 comments · May be fixed by #1895
Open

Feature Request: Better error message when usage limit arrived. (HTTPTooManyRequests) #1893

orena1 opened this issue Dec 23, 2024 · 3 comments · May be fixed by #1895

Comments

@orena1
Copy link

orena1 commented Dec 23, 2024

Is your feature request related to a problem? Please describe.

I encountered frequent errors like the following when running Neptune with NeMo:

Experiencing connection interruptions. Will try to reestablish communication with Neptune. Internal exception was: HTTPTooManyRequests

Once this error occurred, the job would hang and not progress, eventually resulting in a cu10 error message. The only solution I found was to disable Neptune entirely.

After contacting support, I learned this issue is caused by reaching the default workspace usage limit. Here is their response:

Hi Oren,
Thanks for reaching out. Yes, it looks like you're reaching the default workspace usage limit

It would be much better if the error message directly indicated the actual problem, such as:
Experiencing connection interruptions with Neptune. It appears you are reaching the default workspace usage limit. Please review your workspace limits or contact support for assistance.

This would have saved me time (it took 4 hours to diagnose the issue) and prevented frustration. If I had not reached out to support, our company might have abandoned the idea of using Neptune entirely.

Additionally, it would be beneficial if the job did not fail or freeze in cases where usage limits are exceeded. A graceful handling of such situations would improve the user experience.

Additional context:

Where can I check whether I have indeed reached the usage limit? The dashboard currently only shows storage limits, not connection limits. Clarifying this in the UI or documentation would also be helpful.

Thank you!

@orena1 orena1 changed the title Feature Request: Better error message when usage limit arrived. Feature Request: Better error message when usage limit arrived. (HTTPTooManyRequests) Dec 23, 2024
@SiddhantSadangi SiddhantSadangi self-assigned this Dec 24, 2024
@SiddhantSadangi SiddhantSadangi added this to the 1.14 milestone Dec 24, 2024
@SiddhantSadangi
Copy link
Member

Hey @orena1 👋

Thank you for the detailed feature request. We do have a page in the docs that deals with this error: https://docs.neptune.ai/help/reducing_requests/, but I'll add more details, as you've mentioned, with a link to this page in the error message itself 📝

Additionally, it would be beneficial if the job did not fail or freeze in cases where usage limits are exceeded. A graceful handling of such situations would improve the user experience.

The job doesn't actually freeze. Neptune's Lightning integration (on which NeMo's Neptune integration is built) calls a wait() internally to ensure all logging calls have reached the server before proceeding with execution. When already rate-limited, this wait can make it seem as if the training has frozen, when it hasn't. If you check the Neptune WebApp, you should be able to see monitoring metrics being updated (unless there's a large file, like model checkpoint, being uploaded).

Where can I check whether I have indeed reached the usage limit? The dashboard currently only shows storage limits, not connection limits. Clarifying this in the UI or documentation would also be helpful.

Currently, this information is only available on the back end. I'll pass on this feedback to the product team if we can include this on the dashboard somehow 📝

@SiddhantSadangi SiddhantSadangi linked a pull request Dec 24, 2024 that will close this issue
2 tasks
@SiddhantSadangi SiddhantSadangi removed this from the 1.14 milestone Dec 24, 2024
@SiddhantSadangi
Copy link
Member

@orena1 - We have a PR to add a more descriptive error message, complete with links to the docs and who to contact for support.

Can you install this version of neptune from the source to check if this works for you?

pip install git+https://github.com/neptune-ai/neptune-client-scale.git@ss/1.x/HTTPTooManyRequests

@orena1
Copy link
Author

orena1 commented Dec 24, 2024

Thanks @SiddhantSadangi that is much more informative! I can not really test it as these error stopped for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants