logo

Throughput

Throughput measures how much work a system completes in a given time period — typically expressed as requests per second (RPS), tasks per minute, or messages per hour. While latency measures how fast a single request is handled, throughput measures overall request volume.

Throughput vs. Latency

These two metrics are related but distinct:

MetricMeasuresExample
LatencyTime per request200ms per request
ThroughputRequests per time500 requests/second

A system can have low latency but low throughput (fast but only handles one at a time), or high throughput but high latency (handles many but each is slow).

How Async Processing Increases Throughput

Synchronous processing ties throughput to request duration:

Sync: 100 connections × (1 request / 10 seconds) = 10 requests/sec

Async: 100 connections × (1 request / 50ms) = 2,000 requests/sec
       Workers process the actual work independently

By offloading slow operations to a task queue, your web tier accepts requests in milliseconds. Throughput jumps by orders of magnitude because connections aren’t held open waiting for work to finish.

Scaling Throughput with Workers

Workers process tasks from the queue independently and scale horizontally:

  • 1 worker: Sequential processing — throughput limited by task duration
  • 10 workers: 10x throughput — tasks processed in parallel
  • Auto-scaling: Workers scale up during spikes and down during quiet periods

Monitoring Throughput

Key metrics:

  • Tasks created per second: How fast work enters the system
  • Tasks completed per second: How fast work leaves the system
  • Queue depth: If growing, throughput is insufficient — scale up workers
  • Worker utilization: If consistently at 100%, add more workers