Amazon SP-API Rate Limits: Working Within Amazon's Throttling Quotas

Amazon’s Selling Partner API throttles aggressively. Most teams that build their own integrations discover this the hard way — a job that worked fine in development hits 429 errors in production, retries make it worse, and ingestion stalls. Working within rate limits is half the engineering work behind any real Amazon data layer.

This guide is about what the limits are and how to design around them.

TL;DR: SP-API uses dynamic per-endpoint rate limits with a token bucket model. Limits vary by endpoint, by seller account size and by Amazon’s server load. Production ingestion needs request queueing, exponential backoff on 429 errors, and concurrency-aware scheduling per endpoint. Naive parallel calls hit limits within seconds. The right architecture treats rate limits as a hard constraint and orchestrates accordingly.

How SP-API rate limits actually work

Each endpoint has its own rate limit, expressed as requests per second and a burst quota. Amazon uses a token bucket model: you accumulate tokens up to a maximum, and each request consumes one. When the bucket is empty, requests get 429-throttled.

Limits are dynamic. They depend on:

The specific endpoint.
Seller account size and history.
Amazon’s overall server load.
Whether you are a developer or a seller account.

Posted rate limits are a baseline. Real limits drift.

The endpoints that hurt the most

Reports API

Reports take time to generate. The constraint is not just request rate but report queue depth. Pulling many reports concurrently fails because Amazon queues your jobs.

Pricing endpoints (Item Offers, Get Item Offers Batch)

High frequency expected here for repricing, so limits are higher — but still hit by aggressive polling on large catalogs.

Catalog Items

Tight limits per request. Pulling catalog data for thousands of ASINs needs careful batching.

Orders

Reasonable limits, but pagination and date-range filtering matter.

What naive integration looks like

Pull all orders by date range. Hit limit, get 429.
Retry immediately. Hit limit again. Backoff slows you further.
Parallel-fire across endpoints. One endpoint’s throttling cascades to others.
No queue — retries pile up, jobs stall, errors compound.

Result: hours-long ingestion windows, missing data, and silent failures.

What good rate-limit-aware design looks like

Per-endpoint queues

Each endpoint has its own queue with its own concurrency limit. Hit a 429 on one endpoint and only that endpoint slows down.

Exponential backoff on 429

First retry after 1 second. Then 2. Then 4. Then 8. Reset on success.

Adaptive rate detection

Track success rate per endpoint over short windows. If error rate climbs above a threshold, lower concurrency proactively.

Resumable jobs

Long-running ingestion (full backfill, full catalog refresh) needs checkpointing. Crash mid-job, resume from last checkpoint, do not restart.

Priority lanes

Real-time customer-facing requests get priority. Batch backfill yields when high-priority work arrives.

The cost of building this in-house

This infrastructure is not exciting. It is also not optional. Teams that build their own SP-API integration spend significant engineering time on:

Token bucket implementations per endpoint.
Distributed queue and retry logic.
Adaptive rate detection.
Resumable backfill jobs.
Monitoring and alerting on ingestion health.

It is a multi-month project that has to be maintained as Amazon changes limits.

The bottom line

SP-API rate limits are the unglamorous reason most home-grown Amazon integrations stall. The architecture to handle them properly is well understood but real engineering work — the kind that should run on top of a maintained data layer instead of being rebuilt by every team.

DataDoe handles SP-API rate limit orchestration, retries, queueing and backfills as part of the Amazon data layer so your team can query clean data instead of fighting throttling.