429 too many requests #236

jmandivarapu · 2024-12-12T17:18:42Z

Hi Team,

I currently deployed Phi3.5 model both as serverless endpoint and on custom compute. But even when I hit bit more requests it's throwing error 429. When I contacted the support they informed me that I need to change the max_conccurent_requests https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-deployment-managed-online?view=azureml-api-2#requestsettings

But unfortunately I can't find an option to edit that config anywhere as model is hosted by MSFT. Any help would be greatly appreciated

leestott · 2024-12-12T20:19:21Z

Hi Have you deployed the model via Azure AI Foundry https://ai.azure.com or Azure ML Studio https://ml.azure.com

It sounds like you're running Azure Machine Learning endpoints. The max_concurrent_requests setting is indeed crucial for handling higher request loads.
Here are a few steps you can take:

Check Quotas and Limits: Ensure that your subscription has the necessary quotas and limits for the number of concurrent requests. You can view and request quota increases through the Azure portal.

Azure Support Case: Since the model is hosted by Microsoft, you might need to open a support case with Azure to request an increase in the max_concurrent_requests setting. This is often the recommended approach when you can't directly modify the configuration. https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas?view=azureml-api-2

The Azure documentation provides additional insights or workarounds. https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas?view=azureml-api-2#endpoint-quota-increases

jmandivarapu · 2024-12-13T16:21:50Z

Hi @leestott I made the deployment through both the Azure ML Studio and Azure AI Foundry.

Case 1: In Azure AI Foundry or Azure AI Studio I can only deploy the Serverless version of the Phi family. In the Customization I can only see Model Version and Resource Location. But I don't see any option for max_conccurent_requests. But I am facing the same issue 429 for the serverless endpoints.

Case 2: Yes for the model deployed using the AML. I did raise a request for the Quota Increase. But unfortunately AML quota team after long delay's still wanted me to explore the option of max_conccurent_requests.

I am not sure how to proceed from here. The only painful option is host the model on the custom container by downloading the weights. This will lead me to create environment and score file where I can specify some of the parameters max_conccurent_requests. But big downside of approach is when ever Microsoft releases a new version of weight. I need to do redeployment.

Any assistance or suggestion or feedback is greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

429 too many requests #236

429 too many requests #236

jmandivarapu commented Dec 12, 2024

leestott commented Dec 12, 2024

jmandivarapu commented Dec 13, 2024

429 too many requests #236

429 too many requests #236

Comments

jmandivarapu commented Dec 12, 2024

leestott commented Dec 12, 2024

jmandivarapu commented Dec 13, 2024