Skip to main content

Command Palette

Search for a command to run...

Latency Numbers reference for System Design

Updated
5 min read

Latency numbers can provide valuable context during system design , especially when discussing performance optimization, scalability, and trade-offs. Here are some common latency numbers worth referencing:

Modern Hardware Limits

Today’s servers have massive capacities that change the "distributed vs. single machine" trade-off.

  • Compute/Memory: Standard high-end instances (like AWS M6i) offer 128 vCPUs and 512 GB of RAM, while specialized instances can go up to 24 TB of RAM.

  • Storage: Local SSDs can handle 60 TB on a single instance, and HDDs can reach over 300 TB.

  • Networking: 25–100 Gbps is standard within data centers. Latency is sub-1ms within an Availability Zone (AZ) and ~1-2ms between AZs.


Component Capacities (Single Node)

Component

Modern Capacity / Throughput

Server Memory (High-end)

Up to 4 TB (Standard) to 24 TB (Metal)

Local SSD Storage

60 TB+ (e.g., AWS i3en instances)

Database Storage (Single Node)

5-10 TB (before sharding is strictly necessary)

SQL Writes (Postgres/MySQL)

10k - 50k writes/sec (well-tuned)

SQL Reads (Indexed)

100k+ reads/sec

Redis Throughput

100k - 1M operations/sec

App Server Connections

10k - 50k concurrent connections (Async I/O)

Network Bandwidth

25 Gbps - 100 Gbps

Key Rules of Thumb

  • The "1TB Rule": If your total dataset is under 1TB, it can likely fit entirely in the RAM of a few high-memory cache nodes or on the disk of a single modern database instance.

  • The "Sharding Threshold": Don't suggest sharding a database for storage reasons unless you exceed 5-10 TB. Don't shard for write throughput unless you exceed 20k-50k writes/second.

  • The "Cache-First" Fallacy: Modern NVMe SSDs are so fast (10-50μs) that if your database query is a simple primary key lookup, you might not even need Redis for performance; use it for scaling read-heavy traffic or reducing DB load instead.

  • Concurrency: One single modern application server can handle almost any "mid-sized" startup’s total traffic. When designing for millions of users, think in dozens of servers, not thousands.


Basic Operations

  • L1 cache reference: ~1 nanoseconds

  • L2 cache reference: ~7 nanoseconds

  • Main memory (RAM) reference: ~0.1 milliseconds

  • SSD I/O (read/write): ~100 microseconds

  • Disk I/O (HDD, seek): ~10 milliseconds

When you read from a database or a remote cache, you aren't just paying for the time it takes to find the data; you are paying for the round-trip journey.

  • Remote Cache (e.g., Redis on a separate VM):

    • Internal Processing: ~0.1 ms

    • Network Overhead: ~0.5 ms to 1.0 ms (within the same Availability Zone)

    • Total: ~1.1 ms

  • Remote Database (e.g., Postgres/MySQL):

    • Internal Processing: ~5 ms to 50 ms (index lookup + disk I/O)

    • Network Overhead: ~0.5 ms to 1.0 ms

    • Total: ~5.5 ms to 51 ms


Data Processing

  • Reading 1 MB from RAM: ~250 microseconds

  • Reading 1 MB from SSD: ~1 millisecond

  • Reading 1 MB from HDD: ~10 milliseconds


Network Latencies

  • 1 KB data transfer on 1 Gbps network: ~10 microseconds

  • Round trip within the same AZ: < 1 milliSecond

  • Round trip between cross AZ (same region): ~ 1-2 ms

  • Round trip between two data centers (different region): ~ 60-200 ms depends on distance

  • Round trip between inter-continent: ~ 150 ms depends on distance


Cloud Services

  • API gateway call latency: ~1-10 milliseconds

  • Query on a NoSQL database (e.g., DynamoDB): ~5-20 milliseconds

  • Query on an SQL database: ~5-10 milliseconds for simple queries; complex queries can take seconds.


FAQ

  1. Main memory (RAM) reference is 100 nanoseconds but Reading 1 MB from RAM is 250 microseconds, explain ?

    Answer:

    100 ns: Time to access a single memory location (latency), which is to fetch a small chunk of data (e.g., 64 bytes).

    250 µs: Time to read 1 MB, including latency and transfer time.

    • Modern RAM modules have high bandwidth, often in the range of tens of GB/s. For example:

    • Assume a memory bandwidth of 20 GB/s (DDR4/DDR5 range).

    • Time to transfer 1 MB = 1 MB / 20 GB/s=2^20 bytes / 20×10^9 bytes/s ≈ 50 μs

    However, the transfer process also incurs latency overheads for accessing multiple addresses and managing the memory bus, which is why the total time to read 1 MB is closer to ~250 microseconds rather than the raw bandwidth estimate.

  2. Disk I/O (HDD, seek) is 10 milliseconds and Reading 1 MB from HDD is also 10 milliseconds, why ?

    Answer:

    Disk I/O refers to the seek time, which is the delay required for the hard disk drive (HDD) to position its read/write head over the correct track on the spinning disk. This latency happens before any data is read and is independent of the data size.

    Reading 1 MB from HDD: ~10 milliseconds

    • This is the total time required to read 1 MB of data from the disk, including:

      1. Seek time (~10 ms): Positioning the read head.

      2. Data transfer time: Time to physically transfer 1 MB from the spinning disk to memory.

      • Modern HDDs have sequential read speeds of ~100 MB/s. Therefore:

        • Transfer time for 1 MB = 1 MB^ 100 MB/s=0.01 seconds=10 ms

    For small reads (e.g., a few KB or even 1 byte), the seek time dominates, so the total latency is still close to 10 ms.

    For larger reads (e.g., 1 MB), the transfer time adds to the seek time, but because the transfer speed is high, it doesn’t increase latency significantly for moderate data sizes like 1 MB.

More from this blog

Anish Ratnawat's Tech Blog

21 posts