Throughput
Throughput measures how much work a system completes in a given time period — typically expressed as requests per second (RPS), tasks per minute, or messages per hour. While latency measures how fast a single request is handled, throughput measures overall request volume.
Throughput vs. Latency
These two metrics are related but distinct:
| Metric | Measures | Example |
|---|---|---|
| Latency | Time per request | 200ms per request |
| Throughput | Requests per time | 500 requests/second |
A system can have low latency but low throughput (fast but only handles one at a time), or high throughput but high latency (handles many but each is slow).
How Async Processing Increases Throughput
Synchronous processing ties throughput to request duration:
Sync: 100 connections × (1 request / 10 seconds) = 10 requests/sec
Async: 100 connections × (1 request / 50ms) = 2,000 requests/sec
Workers process the actual work independently
By offloading slow operations to a task queue, your web tier accepts requests in milliseconds. Throughput jumps by orders of magnitude because connections aren’t held open waiting for work to finish.
Scaling Throughput with Workers
Workers process tasks from the queue independently and scale horizontally:
- 1 worker: Sequential processing — throughput limited by task duration
- 10 workers: 10x throughput — tasks processed in parallel
- Auto-scaling: Workers scale up during spikes and down during quiet periods
Monitoring Throughput
Key metrics:
- Tasks created per second: How fast work enters the system
- Tasks completed per second: How fast work leaves the system
- Queue depth: If growing, throughput is insufficient — scale up workers
- Worker utilization: If consistently at 100%, add more workers