Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to solve the problem of experiments stalling? #42

Open
YUjh0729 opened this issue May 30, 2024 · 3 comments
Open

How to solve the problem of experiments stalling? #42

YUjh0729 opened this issue May 30, 2024 · 3 comments

Comments

@YUjh0729
Copy link

Hello,

When I train the model, the experiment stops at a certain epoch and doesn't continue training. The GPU usage is at 1% and the memory usage is 12GB, indicating that the experiment is still running. However, it stays stuck at the current epoch for an entire night, preventing the experiment from progressing. What could be the problem? Can you help explain this?
屏幕截图 2024-05-30 093003

Thank you.

@zcyrique
Copy link

zcyrique commented Aug 16, 2024

Hi @YUjh0729 ,
I'm having the same issue as you! Were you able to solve it? Any help would be greatly appreciated.
@JunMa11, any help on this one?
Thank you!

@YUjh0729
Copy link
Author

Hi @zcyrique ,
I've tried all the solutions from the issues, but none of them resolved the issue.

@zcyrique
Copy link

Thank you @YUjh0729 for reaching out, Let wait for anyone who may have solved this issue for help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants