Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for Azure Batch job quota to clear before submitting a new job #5575

Open
adamrtalbot opened this issue Dec 5, 2024 · 0 comments
Open

Comments

@adamrtalbot
Copy link
Collaborator

New feature

Occasionally, you can hit the Azure Batch job limit when running a pipeline, leading to this error:

Error executing process > 'NFCORE_VIRALRECON:ILLUMINA:VARIANTS_IVAR:IVAR_VARIANTS_TO_VCF (SAMPLE2_PE)'

Caused by:
  Status code 409, "{
    "odata.metadata":"https://name.region.batch.azure.com/$metadata#Microsoft.Azure.Batch.Protocol.Entities.Container.errors/@Element","code":"ActiveJobAndScheduleQuotaReached","message":{
      "lang":"en-US","value":"Active job and job schedule quota for the account has been reached.\nRequestId:d28008e3-4018-41cc-90ff-6d96a7952ea3\nTime:2024-12-05T02:43:12.5743776Z"
    }
  }"

We could catch this error and just wait for the job quota to clear. This could be configurable as an azure.batch config option.

Usage scenario

Set azure.batch.behaviourOnJobLimit = 'retry' and Azure will retry with a backoff until cancelled. Alternatively, set it to error and it will do the current behaviour and raise an error.

Raise a warning to the console so the user is aware.

This would allow pipelines to continue, albeit stuck in a queue. As jobs are cleared, the pipelines will continue.

Suggest implementation

tbc

Related: #4792

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants