Rate limits
To mitigate against misuse and manage capacity on our API, we have implemented limits on how much an organization can use the Claude API.
We have two types of limits:
- Spend limits set a maximum monthly cost an organization can incur for API usage.
- Rate limits restrict the number of API requests an organization can make over a defined period of time.
We enforce service-configured limits at the organization level, but you may also set user-configurable limits for your organization’s workspaces.
About our limits
- Limits are designed to prevent API abuse, while minimizing impact on common customer usage patterns.
- Limits are defined by usage tier, where each tier is associated with a different set of spend and rate limits.
- Your organization will increase tiers automatically as you reach certain thresholds while using the API.
Limits are set at the organization level. You can see your organization’s limits in the Limits page in the Anthropic Console. - You may hit rate limits over shorter time intervals. For instance, a rate of 60 requests per minute (RPM) may be enforced as 1 request per second. Short bursts of requests at a high volume can surpass the rate limit and result in rate limit errors.
- The limits outlined below are our standard limits. If you’re seeking higher, custom limits, contact sales through the Anthropic Console.
- We use the token bucket algorithm to do rate limiting.
Spend limits
Each usage tier has a limit on how much you can spend on the API each calendar month. Once you reach the spend limit of your tier, until you qualify for the next tier, you will have to wait until the next month to be able to use the API again.
To qualify for the next tier, you must meet a deposit requirement and a mandatory wait period. Higher tiers require longer wait periods. Note, to minimize the risk of overfunding your account, you cannot deposit more than your monthly spend limit.
Requirements to advance tier
Usage Tier | Credit Purchase | Wait After First Purchase | Max Usage per Month |
---|---|---|---|
Tier 1 | $5 | 0 days | $100 |
Tier 2 | $40 | 7 days | $500 |
Tier 3 | $200 | 7 days | $1,000 |
Tier 4 | $400 | 14 days | $5,000 |
Monthly Invoicing | N/A | N/A | N/A |
Rate limits
Our rate limits are currently measured in requests per minute, tokens per minute, and tokens per day for each model class. If you exceed any of the rate limits you will get a 429 error. Click on the rate limit tier to view relevant rate limits.
Rate limits are tracked per model, therefore models within the same tier do not share a rate limit.
Model | Requests per minute (RPM) | Tokens per minute (TPM) | Tokens per day (TPD) |
---|---|---|---|
Claude 3.5 Sonnet 2024-10-22 | 50 | 40,000 | 1,000,000 |
Claude 3.5 Sonnet 2024-06-20 | 50 | 40,000 | 1,000,000 |
Claude 3 Opus | 50 | 20,000 | 1,000,000 |
Claude 3 Sonnet | 50 | 40,000 | 1,000,000 |
Claude 3 Haiku | 50 | 50,000 | 5,000,000 |
Setting lower limits for Workspaces
In order to protect Workspaces in your Organization from potential overuse, you can set custom spend and rate limits per Workspace.
Example: If your Organization’s limit is 80,000 tokens per minute, you might limit one Workspace to 30,000 tokens per minute. This protects other Workspaces from potential overuse and ensures a more equitable distribution of resources across your Organization. The remaining 50,000 tokens per minute (or more, if that Workspace doesn’t use the limit) are then available for other Workspaces to use.
Note:
- You can’t set limits on the default Workspace.
- If not set, Workspace limits match the Organization’s limit.
- Organization-wide limits always apply, even if Workspace limits add up to more.
Response headers
The API response includes headers that show you the rate limit enforced, current usage, and when the limit will be reset.
The following headers are returned:
Header | Description |
---|---|
anthropic-ratelimit-requests-limit | The maximum number of requests allowed within any rate limit period. |
anthropic-ratelimit-requests-remaining | The number of requests remaining before being rate limited. |
anthropic-ratelimit-requests-reset | The time when the request rate limit will reset, provided in RFC 3339 format. |
anthropic-ratelimit-tokens-limit | The maximum number of tokens allowed within the any rate limit period. |
anthropic-ratelimit-tokens-remaining | The number of tokens remaining (rounded to the nearest thousand) before being rate limited. |
anthropic-ratelimit-tokens-reset | The time when the token rate limit will reset, provided in RFC 3339 format. |
retry-after | The number of seconds until you can retry the request. |
The tokens rate limit headers display the values for the limit (daily or per-minute) with fewer tokens remaining. For example, if you have exceeded the daily token limit but have not sent any tokens within the last minute, the headers will contain the daily token rate limit values.