Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch errors when trying to provision an instance type that is not available. #26

Open
willprice opened this issue Dec 11, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@willprice
Copy link

When I set an instance type in limits.yaml which I do not have permission to launch, the following occurs:

citc $ cat limits.yaml
cat limits.yaml
g3s.xlarge: 2
citc $ finish

user $ srun --pty -c 2  -I bash

citc $ tail /var/log/slurm/elastic.log
2020-12-11 10:57:04,484 startnode  ERROR     problem launching instance: An error occurred (VcpuLimitExceeded) when calling the RunInstances operation: You have requested more vCPU capacity than your current vCPU limit of 0 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.

From the perspective of the user srun hangs leaving the error opaque, it would be better if srun failed fast rather than just hung with a timeout, or limits.yaml was checked to determine whether it is possible to launch such an instance when finish is executed.

@milliams milliams added the enhancement New feature or request label Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants