Latency Numbers Quick Reference Guide

Latency numbers can provide valuable context during system design , especially when discussing performance optimization, scalability, and trade-offs. Here are some common latency numbers worth referencing:

Modern Hardware Limits

Today’s servers have massive capacities that change the "distributed vs. single machine" trade-off.

Compute/Memory: Standard high-end instances (like AWS M6i) offer 128 vCPUs and 512 GB of RAM, while specialized instances can go up to 24 TB of RAM.
Storage: Local SSDs can handle 60 TB on a single instance, and HDDs can reach over 300 TB.
Networking: 25–100 Gbps is standard within data centers. Latency is sub-1ms within an Availability Zone (AZ) and ~1-2ms between AZs.

Component Capacities (Single Node)

Component	Modern Capacity / Throughput
Server Memory (High-end)	Up to 4 TB (Standard) to 24 TB (Metal)
Local SSD Storage	60 TB+ (e.g., AWS i3en instances)
Database Storage (Single Node)	5-10 TB (before sharding is strictly necessary)
SQL Writes (Postgres/MySQL)	10k - 50k writes/sec (well-tuned)
SQL Reads (Indexed)	100k+ reads/sec
Redis Throughput	100k - 1M operations/sec
App Server Connections	10k - 50k concurrent connections (Async I/O)
Network Bandwidth	25 Gbps - 100 Gbps

Key Rules of Thumb

The "1TB Rule": If your total dataset is under 1TB, it can likely fit entirely in the RAM of a few high-memory cache nodes or on the disk of a single modern database instance.
The "Sharding Threshold": Don't suggest sharding a database for storage reasons unless you exceed 5-10 TB. Don't shard for write throughput unless you exceed 20k-50k writes/second.
The "Cache-First" Fallacy: Modern NVMe SSDs are so fast (10-50μs) that if your database query is a simple primary key lookup, you might not even need Redis for performance; use it for scaling read-heavy traffic or reducing DB load instead.
Concurrency: One single modern application server can handle almost any "mid-sized" startup’s total traffic. When designing for millions of users, think in dozens of servers, not thousands.

Basic Operations

L1 cache reference: ~1 nanoseconds
L2 cache reference: ~7 nanoseconds
Main memory (RAM) reference: ~0.1 milliseconds
SSD I/O (read/write): ~100 microseconds
Disk I/O (HDD, seek): ~10 milliseconds

When you read from a database or a remote cache, you aren't just paying for the time it takes to find the data; you are paying for the round-trip journey.

Remote Cache (e.g., Redis on a separate VM):
- Internal Processing: ~0.1 ms
- Network Overhead: ~0.5 ms to 1.0 ms (within the same Availability Zone)
- Total: ~1.1 ms
Remote Database (e.g., Postgres/MySQL):
- Internal Processing: ~5 ms to 50 ms (index lookup + disk I/O)
- Network Overhead: ~0.5 ms to 1.0 ms
- Total: ~5.5 ms to 51 ms

Data Processing

Reading 1 MB from RAM: ~250 microseconds
Reading 1 MB from SSD: ~1 millisecond
Reading 1 MB from HDD: ~10 milliseconds

Network Latencies

1 KB data transfer on 1 Gbps network: ~10 microseconds
Round trip within the same AZ: < 1 milliSecond
Round trip between cross AZ (same region): ~ 1-2 ms
Round trip between two data centers (different region): ~ 60-200 ms depends on distance
Round trip between inter-continent: ~ 150 ms depends on distance

Cloud Services

API gateway call latency: ~1-10 milliseconds
Query on a NoSQL database (e.g., DynamoDB): ~5-20 milliseconds
Query on an SQL database: ~5-10 milliseconds for simple queries; complex queries can take seconds.

FAQ

Main memory (RAM) reference is 100 nanoseconds but Reading 1 MB from RAM is 250 microseconds, explain ?

Answer:

100 ns: Time to access a single memory location (latency), which is to fetch a small chunk of data (e.g., 64 bytes).

250 µs: Time to read 1 MB, including latency and transfer time.
- Modern RAM modules have high bandwidth, often in the range of tens of GB/s. For example:
- Assume a memory bandwidth of 20 GB/s (DDR4/DDR5 range).
- Time to transfer 1 MB = 1 MB / 20 GB/s=2^20 bytes / 20×10^9 bytes/s ≈ 50 μs
However, the transfer process also incurs latency overheads for accessing multiple addresses and managing the memory bus, which is why the total time to read 1 MB is closer to ~250 microseconds rather than the raw bandwidth estimate.
Disk I/O (HDD, seek) is 10 milliseconds and Reading 1 MB from HDD is also 10 milliseconds, why ?

Answer:

Disk I/O refers to the seek time, which is the delay required for the hard disk drive (HDD) to position its read/write head over the correct track on the spinning disk. This latency happens before any data is read and is independent of the data size.

Reading 1 MB from HDD: ~10 milliseconds
- This is the total time required to read 1 MB of data from the disk, including:
  1. Seek time (~10 ms): Positioning the read head.
  2. Data transfer time: Time to physically transfer 1 MB from the spinning disk to memory.
  - Modern HDDs have sequential read speeds of ~100 MB/s. Therefore:
    - Transfer time for 1 MB = 1 MB^ 100 MB/s=0.01 seconds=10 ms
For small reads (e.g., a few KB or even 1 byte), the seek time dominates, so the total latency is still close to 10 ms.

For larger reads (e.g., 1 MB), the transfer time adds to the seek time, but because the transfer speed is high, it doesn’t increase latency significantly for moderate data sizes like 1 MB.

Latency Numbers reference for System Design

Modern Hardware Limits

Component Capacities (Single Node)

Key Rules of Thumb

Basic Operations

Data Processing

Network Latencies

Cloud Services

FAQ

Comments

System Design Fundamentals

Understanding Gateway, Load Balancer, Forward Proxy, and Reverse Proxy

More from this blog

Loop Engineering: Building Self-Improving Software Systems

Model Context Protocol (MCP) -- Overview & Performance Benchmarks

Agent Harness: The Infrastructure Layer That Makes AI Actually Work

Consistent Hashing: Explained with Implementation Steps

Exploring Retrieval Augmented Generation (RAG) with Vector Databases and AI Agents

Command Palette

Modern Hardware Limits

Component Capacities (Single Node)

Key Rules of Thumb

Basic Operations

Data Processing

Network Latencies

Cloud Services

FAQ

Comments

System Design Fundamentals

Understanding Gateway, Load Balancer, Forward Proxy, and Reverse Proxy

More from this blog