Throughput

Throughput measures how much work a system completes in a given time period — typically expressed as requests per second (RPS), tasks per minute, or messages per hour. While latency measures how fast a single request is handled, throughput measures overall request volume.

Throughput vs. Latency

These two metrics are related but distinct:

Metric	Measures	Example
Latency	Time per request	200ms per request
Throughput	Requests per time	500 requests/second

A system can have low latency but low throughput (fast but only handles one at a time), or high throughput but high latency (handles many but each is slow).

How Async Processing Increases Throughput

Synchronous processing ties throughput to request duration:

Sync: 100 connections × (1 request / 10 seconds) = 10 requests/sec

Async: 100 connections × (1 request / 50ms) = 2,000 requests/sec
       Workers process the actual work independently

By offloading slow operations to a task queue, your web tier accepts requests in milliseconds. Throughput jumps by orders of magnitude because connections aren’t held open waiting for work to finish.

Scaling Throughput with Workers

Workers process tasks from the queue independently and scale horizontally:

1 worker: Sequential processing — throughput limited by task duration
10 workers: 10x throughput — tasks processed in parallel
Auto-scaling: Workers scale up during spikes and down during quiet periods

Monitoring Throughput

Key metrics:

Tasks created per second: How fast work enters the system
Tasks completed per second: How fast work leaves the system
Queue depth: If growing, throughput is insufficient — scale up workers
Worker utilization: If consistently at 100%, add more workers

Throughput

Throughput vs. Latency

How Async Processing Increases Throughput

Scaling Throughput with Workers

Monitoring Throughput

Related Terms