-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
429 too many requests #236
Comments
Hi Have you deployed the model via Azure AI Foundry https://ai.azure.com or Azure ML Studio https://ml.azure.com It sounds like you're running Azure Machine Learning endpoints. The max_concurrent_requests setting is indeed crucial for handling higher request loads. Check Quotas and Limits: Ensure that your subscription has the necessary quotas and limits for the number of concurrent requests. You can view and request quota increases through the Azure portal. Azure Support Case: Since the model is hosted by Microsoft, you might need to open a support case with Azure to request an increase in the max_concurrent_requests setting. This is often the recommended approach when you can't directly modify the configuration. https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas?view=azureml-api-2 The Azure documentation provides additional insights or workarounds. https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas?view=azureml-api-2#endpoint-quota-increases |
Hi @leestott I made the deployment through both the Azure ML Studio and Azure AI Foundry. Case 1: In Azure AI Foundry or Azure AI Studio I can only deploy the Serverless version of the Phi family. In the Customization I can only see Model Version and Resource Location. But I don't see any option for max_conccurent_requests. But I am facing the same issue 429 for the serverless endpoints. Case 2: Yes for the model deployed using the AML. I did raise a request for the Quota Increase. But unfortunately AML quota team after long delay's still wanted me to explore the option of max_conccurent_requests. I am not sure how to proceed from here. The only painful option is host the model on the custom container by downloading the weights. This will lead me to create environment and score file where I can specify some of the parameters max_conccurent_requests. But big downside of approach is when ever Microsoft releases a new version of weight. I need to do redeployment. Any assistance or suggestion or feedback is greatly appreciated. |
Hi Team,
I currently deployed Phi3.5 model both as serverless endpoint and on custom compute. But even when I hit bit more requests it's throwing error 429. When I contacted the support they informed me that I need to change the max_conccurent_requests https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-deployment-managed-online?view=azureml-api-2#requestsettings
But unfortunately I can't find an option to edit that config anywhere as model is hosted by MSFT. Any help would be greatly appreciated
The text was updated successfully, but these errors were encountered: