Latency Numbers reference for System Design
Latency numbers can provide valuable context during system design , especially when discussing performance optimization, scalability, and trade-offs. Here are some common latency numbers worth referencing:
Modern Hardware Limits
Today’s servers have massive capacities that change the "distributed vs. single machine" trade-off.
Compute/Memory: Standard high-end instances (like AWS M6i) offer 128 vCPUs and 512 GB of RAM, while specialized instances can go up to 24 TB of RAM.
Storage: Local SSDs can handle 60 TB on a single instance, and HDDs can reach over 300 TB.
Networking: 25–100 Gbps is standard within data centers. Latency is sub-1ms within an Availability Zone (AZ) and ~1-2ms between AZs.
Component Capacities (Single Node)
Component | Modern Capacity / Throughput |
Server Memory (High-end) | Up to 4 TB (Standard) to 24 TB (Metal) |
Local SSD Storage | 60 TB+ (e.g., AWS i3en instances) |
Database Storage (Single Node) | 5-10 TB (before sharding is strictly necessary) |
SQL Writes (Postgres/MySQL) | 10k - 50k writes/sec (well-tuned) |
SQL Reads (Indexed) | 100k+ reads/sec |
Redis Throughput | 100k - 1M operations/sec |
App Server Connections | 10k - 50k concurrent connections (Async I/O) |
Network Bandwidth | 25 Gbps - 100 Gbps |
Key Rules of Thumb
The "1TB Rule": If your total dataset is under 1TB, it can likely fit entirely in the RAM of a few high-memory cache nodes or on the disk of a single modern database instance.
The "Sharding Threshold": Don't suggest sharding a database for storage reasons unless you exceed 5-10 TB. Don't shard for write throughput unless you exceed 20k-50k writes/second.
The "Cache-First" Fallacy: Modern NVMe SSDs are so fast (10-50μs) that if your database query is a simple primary key lookup, you might not even need Redis for performance; use it for scaling read-heavy traffic or reducing DB load instead.
Concurrency: One single modern application server can handle almost any "mid-sized" startup’s total traffic. When designing for millions of users, think in dozens of servers, not thousands.
Basic Operations
L1 cache reference: ~1 nanoseconds
L2 cache reference: ~7 nanoseconds
Main memory (RAM) reference: ~0.1 milliseconds
SSD I/O (read/write): ~100 microseconds
Disk I/O (HDD, seek): ~10 milliseconds
When you read from a database or a remote cache, you aren't just paying for the time it takes to find the data; you are paying for the round-trip journey.
Remote Cache (e.g., Redis on a separate VM):
Internal Processing: ~0.1 ms
Network Overhead: ~0.5 ms to 1.0 ms (within the same Availability Zone)
Total: ~1.1 ms
Remote Database (e.g., Postgres/MySQL):
Internal Processing: ~5 ms to 50 ms (index lookup + disk I/O)
Network Overhead: ~0.5 ms to 1.0 ms
Total: ~5.5 ms to 51 ms
Data Processing
Reading 1 MB from RAM: ~250 microseconds
Reading 1 MB from SSD: ~1 millisecond
Reading 1 MB from HDD: ~10 milliseconds
Network Latencies
1 KB data transfer on 1 Gbps network: ~10 microseconds
Round trip within the same AZ: < 1 milliSecond
Round trip between cross AZ (same region): ~ 1-2 ms
Round trip between two data centers (different region): ~ 60-200 ms depends on distance
Round trip between inter-continent: ~ 150 ms depends on distance
Cloud Services
API gateway call latency: ~1-10 milliseconds
Query on a NoSQL database (e.g., DynamoDB): ~5-20 milliseconds
Query on an SQL database: ~5-10 milliseconds for simple queries; complex queries can take seconds.
FAQ
Main memory (RAM) reference is 100 nanoseconds but Reading 1 MB from RAM is 250 microseconds, explain ?
Answer:
100 ns: Time to access a single memory location (latency), which is to fetch a small chunk of data (e.g., 64 bytes).
250 µs: Time to read 1 MB, including latency and transfer time.
Modern RAM modules have high bandwidth, often in the range of tens of GB/s. For example:
Assume a memory bandwidth of 20 GB/s (DDR4/DDR5 range).
Time to transfer 1 MB = 1 MB / 20 GB/s=2^20 bytes / 20×10^9 bytes/s ≈ 50 μs
However, the transfer process also incurs latency overheads for accessing multiple addresses and managing the memory bus, which is why the total time to read 1 MB is closer to ~250 microseconds rather than the raw bandwidth estimate.
Disk I/O (HDD, seek) is 10 milliseconds and Reading 1 MB from HDD is also 10 milliseconds, why ?
Answer:
Disk I/O refers to the seek time, which is the delay required for the hard disk drive (HDD) to position its read/write head over the correct track on the spinning disk. This latency happens before any data is read and is independent of the data size.
Reading 1 MB from HDD: ~10 milliseconds
This is the total time required to read 1 MB of data from the disk, including:
Seek time (~10 ms): Positioning the read head.
Data transfer time: Time to physically transfer 1 MB from the spinning disk to memory.
Modern HDDs have sequential read speeds of ~100 MB/s. Therefore:
- Transfer time for 1 MB = 1 MB^ 100 MB/s=0.01 seconds=10 ms
For small reads (e.g., a few KB or even 1 byte), the seek time dominates, so the total latency is still close to 10 ms.
For larger reads (e.g., 1 MB), the transfer time adds to the seek time, but because the transfer speed is high, it doesn’t increase latency significantly for moderate data sizes like 1 MB.



