Why does my Azure OpenAI deployment occasionally return “429 Too Many Requests” even when I am under the documented rate limits?

Allen_09876 · December 3, 2025, 11:07am

I am using the Python SDK to access Azure OpenAI (GPT-4o / GPT-4o-mini).

My usage logs show that I’m well below the Tokens-per-Minute and Requests-per-Minute limits for my instance.

Even so, I sometimes get:

429 Too Many Requests
Please try again later.

This happens randomly in small batches of requests, even when exponential backoff is turned on.

I checked:

Some people online say this can happen because of regional load, hidden rate limits, or shared backend capacity, but no one really knows why.

Has anyone looked into this in depth or found a solution that works?

Is this a problem with Azure, or is there something developers need to set up differently?

system · February 1, 2026, 11:08am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting 429 Too many requests while searching data from index OpenSearch troubleshoot	4	1518	July 20, 2024
AWS Open Search Giving 429's - Metrics Don't Add Up OpenSearch	0	471	December 15, 2022
429 Too Many Request when indexing OpenSearch	3	1756	July 11, 2024
Search getting 429 Too Many Requests OpenSearch discuss , troubleshoot	3	3595	April 4, 2024
ML -predict endpoint keeps timing out Machine Learning	1	157	July 11, 2025