<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Anish Ratnawat's Tech Blog]]></title><description><![CDATA[Anish Ratnawat's Tech Blog]]></description><link>https://anishratnawat.com</link><generator>RSS for Node</generator><lastBuildDate>Thu, 16 Apr 2026 10:50:05 GMT</lastBuildDate><atom:link href="https://anishratnawat.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Model Context Protocol (MCP) -- Overview & Performance Benchmarks]]></title><description><![CDATA[What is MCP?
The Model Context Protocol (MCP) is an open standard created by Anthropic that provides a universal interface for connecting AI models to external data sources, tools, and services.
Think]]></description><link>https://anishratnawat.com/model-context-protocol-mcp-overview-performance-benchmarks</link><guid isPermaLink="true">https://anishratnawat.com/model-context-protocol-mcp-overview-performance-benchmarks</guid><category><![CDATA[mcp]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Tue, 07 Apr 2026 15:23:29 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/64da397a01c2b50cc13d9656/8f7809ee-8b12-4e6f-9691-fb402a730bf3.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>What is MCP?</h2>
<p>The <strong>Model Context Protocol (MCP)</strong> is an open standard created by Anthropic that provides a <strong>universal interface</strong> for connecting AI models to external data sources, tools, and services.</p>
<p>Think of it as a <strong>USB-C port for AI</strong> -- one standardized protocol instead of custom integrations for every tool.</p>
<h3>Core Capabilities</h3>
<table>
<thead>
<tr>
<th>Capability</th>
<th>Description</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Tool Execution</strong></td>
<td>Let LLMs call functions, APIs, and services in a controlled way</td>
</tr>
<tr>
<td><strong>Resource Access</strong></td>
<td>Expose files, databases, and live data to AI models</td>
</tr>
<tr>
<td><strong>Prompt Templates</strong></td>
<td>Share reusable prompt templates &amp; workflows across clients</td>
</tr>
<tr>
<td><strong>Sampling</strong></td>
<td>Servers can request LLM completions back through the client</td>
</tr>
</tbody></table>
<hr />
<h2>MCP Architecture</h2>
<pre><code>Host               MCP Client           MCP Server          Data Sources
(Claude Desktop,   (1:1 connection      (Exposes tools,     (APIs, DBs,
 IDE, custom app)   per server)          resources &amp;          filesystems,
                                         prompts)            SaaS services)
      |                  |                    |                    |
      | ──── creates ──&gt; |                    |                    |
      |                  | ── JSON-RPC 2.0 ─&gt; |                    |
      |                  |                    | ── queries/calls ─&gt;|
      |                  |                    | &lt;── responses ──── |
      |                  | &lt;── responses ──── |                    |
      | &lt;── displays ─── |                    |                    |
</code></pre>
<ul>
<li><strong>Host</strong> -- The user-facing application (e.g. Claude Desktop, VS Code, a custom app). Creates and manages MCP clients.</li>
<li><strong>Client</strong> -- Lives inside the host. Each client holds a stateful 1:1 session with one MCP server. Handles capability negotiation and message routing.</li>
<li><strong>Server</strong> -- A lightweight process that exposes tools, resources, and prompts over the MCP protocol. Can be local or remote.</li>
</ul>
<hr />
<h2>MCP Transport Modes</h2>
<h3>1. stdio (Local Only)</h3>
<p>Communication over <strong>standard input/output</strong> streams. The host spawns the server as a child process. Simplest setup -- no networking needed.</p>
<p><strong>Best for:</strong> Local tools, CLI integrations, IDE extensions, development workflows.</p>
<h3>2. SSE -- HTTP + Server-Sent Events (Remote / Legacy)</h3>
<p>Client sends requests via <strong>HTTP POST</strong> and receives streaming responses over an <strong>SSE channel</strong>. Works over the network.</p>
<p><strong>Best for:</strong> Remote servers, web-based clients, existing HTTP infrastructure.</p>
<h3>3. Streamable HTTP (Recommended)</h3>
<p>The <strong>latest spec</strong> transport. Pure HTTP with optional streaming via SSE. Supports both stateful sessions and stateless request/response patterns.</p>
<p><strong>Best for:</strong> Production deployments, scalable architectures, cloud-native services.</p>
<blockquote>
<p>All transports use <strong>JSON-RPC 2.0</strong> as the message format. The protocol supports three message types: <strong>requests</strong> (expect response), <strong>responses</strong> (reply to request), and <strong>notifications</strong> (fire-and-forget).</p>
</blockquote>
<hr />
<h2>Performance Benchmarks</h2>
<h3>Test Overview</h3>
<table>
<thead>
<tr>
<th>Metric</th>
<th>Value</th>
</tr>
</thead>
<tbody><tr>
<td>Total Requests</td>
<td><strong>3.9 million</strong></td>
</tr>
<tr>
<td>Error Rate</td>
<td><strong>0%</strong> (all implementations)</td>
</tr>
<tr>
<td>Languages Tested</td>
<td>Java, Go, Node.js, Python</td>
</tr>
<tr>
<td>Test Rounds</td>
<td>3 independent runs</td>
</tr>
</tbody></table>
<hr />
<h3>Benchmark Tools Used</h3>
<p>Each MCP server implemented 4 tool types covering different workload profiles:</p>
<table>
<thead>
<tr>
<th>Tool</th>
<th>Category</th>
<th>Description</th>
</tr>
</thead>
<tbody><tr>
<td><code>calculate_fibonacci</code></td>
<td>CPU-Bound</td>
<td>Pure computation. Calculates Fibonacci numbers to stress-test raw CPU performance and function call overhead with no I/O.</td>
</tr>
<tr>
<td><code>fetch_external_data</code></td>
<td>I/O-Bound</td>
<td>Network I/O. Simulates fetching data from an external API to measure async I/O and network latency handling.</td>
</tr>
<tr>
<td><code>process_json_data</code></td>
<td>Data Processing</td>
<td>Serialization. Parses, transforms, and serializes JSON payloads to benchmark memory allocation, parsing speed, and GC pressure.</td>
</tr>
<tr>
<td><code>simulate_database_query</code></td>
<td>Latency-Sensitive</td>
<td>Simulated DB query with ~10 ms built-in delay. Measures overhead each runtime adds on top of a fixed-latency operation.</td>
</tr>
</tbody></table>
<hr />
<h3>Latency &amp; Throughput</h3>
<table>
<thead>
<tr>
<th>Server</th>
<th>Avg Latency</th>
<th>p95 Latency</th>
<th>Throughput (RPS)</th>
<th>Total Requests</th>
<th>Error Rate</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Java</strong></td>
<td>0.835 ms</td>
<td>10.19 ms</td>
<td>1,624</td>
<td>1,559,520</td>
<td>0%</td>
</tr>
<tr>
<td><strong>Go</strong></td>
<td>0.855 ms</td>
<td>10.03 ms</td>
<td>1,624</td>
<td>1,558,000</td>
<td>0%</td>
</tr>
<tr>
<td><strong>Node.js</strong></td>
<td>10.66 ms</td>
<td>53.24 ms</td>
<td>559</td>
<td>534,150</td>
<td>0%</td>
</tr>
<tr>
<td><strong>Python</strong></td>
<td>26.45 ms</td>
<td>73.23 ms</td>
<td>292</td>
<td>280,605</td>
<td>0%</td>
</tr>
</tbody></table>
<ul>
<li>Java &amp; Go deliver <strong>~3x</strong> the throughput of Node.js and <strong>~5.5x</strong> of Python</li>
<li>Python is <strong>~31x slower</strong> than Go/Java; Node.js is <strong>~12x slower</strong></li>
</ul>
<hr />
<h3>Resource Utilization</h3>
<table>
<thead>
<tr>
<th>Server</th>
<th>Avg CPU</th>
<th>Avg Memory</th>
<th>RPS per MB Memory</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Java</strong></td>
<td>28.8%</td>
<td>226 MB</td>
<td>7.2</td>
</tr>
<tr>
<td><strong>Go</strong></td>
<td>31.8%</td>
<td>18 MB</td>
<td><strong>92.6</strong></td>
</tr>
<tr>
<td><strong>Node.js</strong></td>
<td>98.7%</td>
<td>110 MB</td>
<td>5.1</td>
</tr>
<tr>
<td><strong>Python</strong></td>
<td>93.9%</td>
<td>98 MB</td>
<td>3.1</td>
</tr>
</tbody></table>
<ul>
<li>Go uses just <strong>18 MB</strong> of memory -- <strong>12.5x less</strong> than Java, with identical throughput</li>
<li>Go delivers <strong>12.8x</strong> more throughput per MB than Java -- crucial for container/K8s environments</li>
</ul>
<hr />
<h3>Tool-Specific Latency (ms)</h3>
<table>
<thead>
<tr>
<th>Tool</th>
<th>Java</th>
<th>Go</th>
<th>Node.js</th>
<th>Python</th>
</tr>
</thead>
<tbody><tr>
<td><code>calculate_fibonacci</code></td>
<td>0.369</td>
<td>0.388</td>
<td>7.11</td>
<td>30.83</td>
</tr>
<tr>
<td><code>fetch_external_data</code></td>
<td>1.316</td>
<td>1.292</td>
<td>19.18</td>
<td>80.92</td>
</tr>
<tr>
<td><code>process_json_data</code></td>
<td>0.352</td>
<td>0.443</td>
<td>7.48</td>
<td>34.24</td>
</tr>
<tr>
<td><code>simulate_database_query</code></td>
<td>10.37</td>
<td>10.71</td>
<td>26.71</td>
<td>42.57</td>
</tr>
</tbody></table>
<ul>
<li>DB-bound operations narrow the gap; compute &amp; I/O tasks show the widest spread</li>
</ul>
<hr />
<h2>Key Findings</h2>
<ol>
<li><strong>Java &amp; Go</strong> are effectively tied on latency and throughput -- both deliver sub-millisecond averages and <strong>1,624 RPS</strong>.</li>
<li><strong>Go's memory footprint</strong> is dramatically lower at <strong>18 MB</strong> vs Java's 226 MB -- a <strong>12.5x advantage</strong> for containerized workloads.</li>
<li><strong>Node.js &amp; Python</strong> consume &gt;93% CPU under load while Java and Go remain under 32%, leaving significant headroom.</li>
<li><strong>Node.js</strong> is 10-12x slower due to per-request MCP server instantiation for security isolation.</li>
<li><strong>All implementations</strong> achieved a <strong>0% error rate</strong> across 3.9M requests -- stability is not the differentiator.</li>
</ol>
<hr />
<h2>Production Recommendations</h2>
<h3>Go -- Cloud-Native &amp; Cost-Optimized</h3>
<p>Best for Kubernetes, horizontal scaling, and cloud deployments. 12.8x better memory efficiency than Java means fewer pods and lower infrastructure cost.</p>
<h3>Java -- Lowest Latency &amp; Mature Ecosystem</h3>
<p>Best when absolute lowest latency matters and your team needs a rich ecosystem for complex business logic. Higher memory cost is the trade-off.</p>
<h3>Node.js -- Moderate Traffic (&lt;500 RPS)</h3>
<p>Viable for teams with existing JavaScript expertise. Security-focused per-request isolation adds overhead -- acceptable at moderate scale.</p>
<h3>Python -- Dev / Test / Low Traffic</h3>
<p>Best suited for development, testing, prototyping, or very low-traffic scenarios (&lt;100 RPS). Not recommended for production workloads at scale.</p>
<hr />
<h2>Conclusion</h2>
<ul>
<li>For <strong>maximum efficiency</strong> --&gt; <strong>Go</strong></li>
<li>For <strong>lowest latency + ecosystem depth</strong> --&gt; <strong>Java</strong></li>
<li>For <strong>moderate loads with JS teams</strong> --&gt; <strong>Node.js</strong></li>
<li>Keep <strong>Python</strong> for <strong>dev &amp; prototyping</strong></li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Agent Harness: The Infrastructure Layer That Makes AI Actually Work]]></title><description><![CDATA[Table of Contents

The AI Reliability Problem

What Is an Agent Harness?

Harness vs. Orchestrator — What's the Difference?

The 5 Core Components of a Good Harness

Advanced Pattern: Persistent Memor]]></description><link>https://anishratnawat.com/agent-harness-the-infrastructure-layer-that-makes-ai-actually-work</link><guid isPermaLink="true">https://anishratnawat.com/agent-harness-the-infrastructure-layer-that-makes-ai-actually-work</guid><category><![CDATA[Harness]]></category><category><![CDATA[agent-harness]]></category><category><![CDATA[agent-bug-knowledge]]></category><category><![CDATA[agent-bug-memory]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 28 Mar 2026 05:00:00 GMT</pubDate><content:encoded><![CDATA[<hr />
<h2>Table of Contents</h2>
<ol>
<li><p><a href="#1-the-ai-reliability-problem">The AI Reliability Problem</a></p>
</li>
<li><p><a href="#2-what-is-an-agent-harness">What Is an Agent Harness?</a></p>
</li>
<li><p><a href="#3-harness-vs-orchestrator--whats-the-difference">Harness vs. Orchestrator — What's the Difference?</a></p>
</li>
<li><p><a href="#4-the-5-core-components-of-a-good-harness">The 5 Core Components of a Good Harness</a></p>
</li>
<li><p><a href="#5-advanced-pattern-persistent-memory">Advanced Pattern: Persistent Memory</a></p>
</li>
<li><p><a href="#6-advanced-pattern-bug-knowledge-base">Advanced Pattern: Bug Knowledge Base</a></p>
</li>
<li><p><a href="#7-real-world-examples-from-production">Real-World Examples from Production</a></p>
</li>
<li><p><a href="#8-harness-engineering-as-a-discipline">Harness Engineering as a Discipline</a></p>
</li>
<li><p><a href="#9-why-the-harness-is-the-moat-not-the-model">Why the Harness Is the Moat, Not the Model</a></p>
</li>
<li><p><a href="#10-the-future-self-optimizing-harnesses">The Future: Self-Optimizing Harnesses</a></p>
</li>
<li><p><a href="#11-conclusion">Conclusion</a></p>
</li>
</ol>
<hr />
<h2>1. The AI Reliability Problem</h2>
<p>You've seen the demos. An AI agent writes code, browses the web, makes decisions — all autonomously. It looks magical. Then you try to run it in production for a real task with 50+ steps, and it quietly goes off the rails.</p>
<p>This is the reliability gap. Models are getting smarter, but smart alone doesn't mean reliable. Benchmarks measure one-shot performance. Real production tasks are multi-step, long-running, and full of edge cases.</p>
<blockquote>
<p><strong>Think of it this way:</strong> A Formula 1 engine is incredible. But without a chassis, steering wheel, brakes, and tires, it doesn't go anywhere useful. The engine is the model. Everything else is the harness.</p>
</blockquote>
<p>The question developers need to answer in 2026 isn't "which model is best?" It's "how do we wrap models so they work reliably?" That's what an agent harness solves.</p>
<hr />
<h2>2. What Is an Agent Harness?</h2>
<p>An agent harness is the complete infrastructure that wraps around an AI model to manage long-running tasks. It is <strong>not the model itself</strong>. It is everything else the model needs to work reliably in the real world.</p>
<pre><code class="language-plaintext">Agent = Model + Harness
</code></pre>
<p>The model generates responses. The harness handles everything else: memory between sessions, which tools the model can access, guardrails that prevent catastrophic failures, the feedback loops that help it self-correct, and the observability layer that lets humans monitor what's happening.</p>
<p>If you've used Claude Code, you've experienced a harness. What makes it powerful isn't Claude alone — it's the harness around Claude: context management, filesystem controls, tool orchestration, session persistence, and the permission model that keeps it safe.</p>
<pre><code class="language-plaintext">harness/
├── context/        # memory, session state, compaction
├── tools/          # what the agent can do
├── guardrails/     # what the agent must not do
├── planner/        # how tasks are broken down
├── evaluator/      # checking output quality
└── lifecycle/      # start, handoff, end of sessions
</code></pre>
<hr />
<h2>3. Harness vs. Orchestrator — What's the Difference?</h2>
<p>This trips up a lot of developers. The terms sound similar but they operate at different layers.</p>
<table>
<thead>
<tr>
<th></th>
<th>Orchestrator</th>
<th>Harness</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Concern</strong></td>
<td>Logic and control flow</td>
<td>Capabilities and infrastructure</td>
</tr>
<tr>
<td><strong>Does what</strong></td>
<td>Decides what to do next</td>
<td>Gives the model its tools</td>
</tr>
<tr>
<td><strong>Manages</strong></td>
<td>Task sequencing, routing</td>
<td>Memory, context, side-effects</td>
</tr>
<tr>
<td><strong>Enforces</strong></td>
<td>Reasoning loop (ReAct, etc.)</td>
<td>Guardrails, permissions</td>
</tr>
<tr>
<td><strong>Analogy</strong></td>
<td>The brain of the operation</td>
<td>The hands and infrastructure</td>
</tr>
</tbody></table>
<p>They work together. The orchestrator says "invoke the model with this prompt." The harness ensures when the model is invoked, it has the right tools, context, and environment. You need both. Improving either one dramatically improves real-world performance.</p>
<hr />
<h2>4. The 5 Core Components of a Good Harness</h2>
<p>Production-grade harnesses are built around five key responsibilities. Neglect any one of them, and reliability breaks down.</p>
<h3>Component 1: Human-in-the-loop controls</h3>
<p>Agents must pause at high-stakes decisions. Deleting a database, charging a credit card, sending emails to customers — these need human approval. A harness defines exactly where those checkpoints are and blocks execution until a human confirms.</p>
<h3>Component 2: Context and memory management</h3>
<p>LLMs have no memory between sessions by default. A harness solves this with context compaction, session handoff artifacts, and dynamic retrieval (RAG). Anthropic's harness maintains a <code>claude-progress.txt</code> log so long tasks can resume where they left off.</p>
<h3>Component 3: Tool call orchestration</h3>
<p>Bad orchestration creates infinite loops and cascading failures. Good harnesses define which tools are available, when to use them, the correct order, and how to handle errors gracefully. Vercel famously removed 80% of their agent's tools and got better results — fewer choices, fewer mistakes.</p>
<h3>Component 4: Sub-agent coordination</h3>
<p>Complex tasks need specialized agents. One researches, another writes, a third reviews. The harness manages communication between them, merges their outputs, and resolves conflicts.</p>
<h3>Component 5: Prompt preset management</h3>
<p>Different tasks need different instructions. A harness stores, versions, and selects the right system prompt for each task type — rather than pasting the same monolithic prompt everywhere.</p>
<hr />
<h2>5. Advanced Pattern: Persistent Memory</h2>
<p>By default, every time you start a new session with an LLM, it has no idea who you are, what you worked on yesterday, or what bugs you fixed last week. The model is stateless. Persistent memory is the harness layer that fixes this.</p>
<p>This isn't about stuffing old conversations into the context window — that's expensive and hits limits fast. It's about selectively storing, indexing, and retrieving the right memories at the right time.</p>
<h3>The three layers of memory</h3>
<p><strong>L1 — In-context memory (ephemeral)</strong> What's currently in the context window. Fast, but lost when the session ends. The harness manages what lives here via compaction — summarizing older turns, dropping irrelevant tool outputs, keeping only what the model needs right now.</p>
<p><strong>L2 — External memory store (session-persistent)</strong> A vector database (Pinecone, pgvector, Chroma) or key-value store that survives session boundaries. The harness writes summaries, decisions, and facts here — and retrieves them via semantic search when starting a new session.</p>
<p><strong>L3 — Structured state (long-term)</strong> A progress file or structured JSON document the harness maintains across days. Anthropic's Claude Code harness uses a <code>claude-progress.txt</code> for exactly this — a human-readable, agent-writable log of what has been done, what is pending, and what decisions were made.</p>
<h3>How it works end-to-end</h3>
<ul>
<li><p><strong>Session start:</strong> Harness queries L2/L3 for relevant memories → injects them into the system prompt → agent starts informed, not blank.</p>
</li>
<li><p><strong>Session end:</strong> Harness extracts key facts, decisions, and unfinished tasks → compresses them → writes to L2/L3 → next session picks up exactly here.</p>
</li>
</ul>
<h3>Tradeoffs</h3>
<table>
<thead>
<tr>
<th>Advantages</th>
<th>Risks</th>
</tr>
</thead>
<tbody><tr>
<td>Agent builds team-wide context over time</td>
<td>Stale memories can mislead the agent</td>
</tr>
<tr>
<td>No repeated re-explanation across sessions</td>
<td>Retrieval quality depends on embedding model</td>
</tr>
<tr>
<td>Faster task start — agent arrives informed</td>
<td>Privacy: memories may contain sensitive code</td>
</tr>
<tr>
<td>Survives model swaps — memory is external</td>
<td>Storage grows unboundedly without pruning</td>
</tr>
<tr>
<td>Human-readable audit trail of decisions</td>
<td>Debugging retrieval failures is non-trivial</td>
</tr>
</tbody></table>
<blockquote>
<p><strong>Critical implementation note:</strong> Never inject all memories — inject only what's relevant to the current task. Keep retrieved memory injections under ~500 tokens. Use tags and metadata filters aggressively.</p>
</blockquote>
<hr />
<h2>6. Advanced Pattern: Bug Knowledge Base</h2>
<p>Every developer has lived this: you spend three hours debugging a cryptic error, find the fix, close the ticket — and six months later a colleague hits the exact same bug and spends three hours on it too. The knowledge died with the PR comment.</p>
<p>A bug knowledge base is a harness component that captures bug-fix pairs at the moment of resolution and makes them retrievable — by the agent, for any future developer, automatically. This turns individual debugging effort into compounding team intelligence.</p>
<h3>The data model</h3>
<pre><code class="language-plaintext">BugEntry fields:
  bug_id          string    Linked GitHub issue or Jira ticket ID
  error_signature string    Canonical error message or symptom description
  root_cause      string    Why the bug occurred — human or agent-authored
  fix_summary     string    What was changed and why, in plain language
  diff            string    Actual code diff (sanitized of secrets)
  affected_files  string[]  Files involved — for scoped retrieval
  tags            string[]  e.g. ["auth", "race-condition", "postgres"]
  resolved_by     string    Developer or agent — for attribution
</code></pre>
<h3>The four capture points</h3>
<p><strong>Capture 1 — Merged PR (automatic)</strong> When a PR labelled <code>bug-fix</code> merges into <code>main</code>, a GitHub Actions webhook fires. An extraction agent reads the diff and PR description, structures it into a BugEntry, and writes it to the KB. Zero manual effort after the label is applied.</p>
<p><strong>Capture 2 — Agent runtime error (automatic)</strong> While an agent executes a task and hits an exception, the harness error hook intercepts it before the agent attempts a fix. It queries the KB for similar past bugs and injects the top matches into context.</p>
<p><strong>Capture 3 — Manual developer submission (on-demand)</strong> For bugs fixed outside normal PRs — hotfixes, config changes, infrastructure bugs, tribal knowledge — a developer submits directly via a CLI script or internal tool.</p>
<p><strong>Capture 4 — Post-fix agent write-back (automatic feedback loop)</strong> After the debugging agent resolves an issue, the harness writes the new bug-fix pair back to the KB. Every new fix the agent makes enriches the KB for the next run.</p>
<hr />
<h3>Choosing your storage backend — four tiers</h3>
<p>The vector DB is just one option. Start at Tier 1 and migrate only when you feel the limitations. These tiers are additive — Markdown is always the source of truth.</p>
<h4>Tier 1 — Markdown files in the repo ✅ Recommended to start</h4>
<p>One <code>.md</code> file per bug inside a <code>.bugs/</code> directory. Git-native, versioned, PR-reviewable, human-editable. Agent retrieves via grep.</p>
<p><strong>When to use:</strong> Small teams (&lt;10 devs), &lt;300 bugs, or just getting started. Zero friction.</p>
<pre><code class="language-markdown">&lt;!-- .bugs/BUG-2026-042.md --&gt;

## Error
TypeError: Cannot read properties of undefined (reading 'token')
at AuthMiddleware.verify (src/auth/middleware.ts:34)

## Root cause
JWT refresh ran before user session was hydrated.
Race condition between session.init() and token.verify().

## Fix
Awaited session.init() before token.verify() in middleware.
Added guard: if (!session.ready) throw new SessionNotReady().

## Files changed
src/auth/middleware.ts, src/session/index.ts

## Tags
auth, race-condition, jwt, async

## Resolved by
@priya — 2026-03-18 — BUG-2026-042
</code></pre>
<pre><code class="language-python"># Harness retrieval — grep across .bugs/
import subprocess, pathlib

def retrieve_md(error: str, bugs_dir=".bugs") -&gt; str:
    keywords = error.split()[:6]
    hits = set()
    for kw in keywords:
        out = subprocess.run(
            ["grep", "-rl", kw, bugs_dir],
            capture_output=True, text=True
        ).stdout.strip()
        hits.update(out.splitlines())
    docs = [pathlib.Path(p).read_text() for p in list(hits)[:3]]
    return "\n\n---\n\n".join(docs)
</code></pre>
<table>
<thead>
<tr>
<th>Advantages</th>
<th>Limitations</th>
</tr>
</thead>
<tbody><tr>
<td>Lives in repo — versioned in Git</td>
<td>Keyword retrieval only — no semantic</td>
</tr>
<tr>
<td>PRs review the KB alongside code</td>
<td>Slow at scale (500+ files)</td>
</tr>
<tr>
<td>Devs read and edit directly</td>
<td>Misses paraphrase matches</td>
</tr>
<tr>
<td>Zero infra — works offline</td>
<td>Duplicate detection is manual</td>
</tr>
</tbody></table>
<hr />
<h4>Tier 2 — Markdown source + SQLite FTS5 index ✅ Recommended at scale</h4>
<p>Markdown stays the human-readable source of truth. A SQLite FTS5 database provides fast full-text search. Index is rebuilt by CI when <code>.bugs/*.md</code> files change. No external services.</p>
<p><strong>When to use:</strong> 300–2000 bugs, or when grep is getting slow.</p>
<pre><code class="language-python"># index_bugs.py — run in CI on .bugs/ changes
import sqlite3, glob, pathlib

conn = sqlite3.connect("bugs.db")
conn.execute("""
  CREATE VIRTUAL TABLE IF NOT EXISTS bugs USING fts5(
    bug_id, error_signature, root_cause, fix_summary, tags
  )""")

for path in glob.glob(".bugs/*.md"):
    text = pathlib.Path(path).read_text()
    sections = parse_md_sections(text)
    conn.execute(
        "INSERT OR REPLACE INTO bugs VALUES (?,?,?,?,?)",
        (pathlib.Path(path).stem,
         sections["Error"], sections["Root cause"],
         sections["Fix"],   sections["Tags"])
    )
conn.commit()

# Retrieval — FTS5 ranked full-text search
def retrieve_fts(query: str):
    rows = conn.execute(
        "SELECT * FROM bugs WHERE bugs MATCH ? "
        "ORDER BY rank LIMIT 3", (query,)
    ).fetchall()
    return rows
</code></pre>
<table>
<thead>
<tr>
<th>Advantages</th>
<th>Limitations</th>
</tr>
</thead>
<tbody><tr>
<td>Markdown stays editable and readable</td>
<td>Index must rebuild when files change</td>
</tr>
<tr>
<td>SQLite is a single local file — no server</td>
<td>Still keyword-based — not semantic</td>
</tr>
<tr>
<td>FTS5 is very fast at 10,000+ entries</td>
<td>Two things to keep in sync</td>
</tr>
<tr>
<td>No embedding model or API needed</td>
<td>Misses paraphrase matches</td>
</tr>
</tbody></table>
<hr />
<h4>Tier 3 — JSONL flat file (Agent-heavy teams)</h4>
<p>One JSON object per line, append-only. Best as the agent write target, with markdown as the human read layer.</p>
<p><strong>When to use:</strong> Agents are writing frequently, or you want zero-overhead append writes.</p>
<pre><code class="language-python"># Retrieval by tag filter
def retrieve_jsonl(query_tags: list, path="bugs.jsonl"):
    results = []
    with open(path) as f:
        for line in f:
            bug = json.loads(line)
            if any(t in bug["tags"] for t in query_tags):
                results.append(bug)
    return results[:3]

# Write — agent appends after fixing a bug
def store_jsonl(entry: dict, path="bugs.jsonl"):
    with open(path, "a") as f:
        f.write(json.dumps(entry) + "\n")
</code></pre>
<hr />
<h4>Tier 4 — Vector database (1000+ bugs, semantic search)</h4>
<p>Embeddings-based similarity search. Finds bugs even when wording differs. Add this on top of markdown, never instead of it.</p>
<p><strong>When to use:</strong> 1000+ bugs, or when keyword search misses too many relevant matches.</p>
<pre><code class="language-python">import chromadb
from sentence_transformers import SentenceTransformer

encoder = SentenceTransformer("all-MiniLM-L6-v2")
db = chromadb.PersistentClient(path="./bug_kb")
bugs = db.get_or_create_collection("bug_entries")

def store_vector(entry: dict):
    text = entry["error_signature"] + " " + entry["root_cause"]
    embedding = encoder.encode(text).tolist()
    bugs.add(
        ids=[entry["bug_id"]],
        embeddings=[embedding],
        documents=[json.dumps(entry)],
        metadatas=[{"tags": json.dumps(entry["tags"])}]
    )

def retrieve_vector(error: str, k=3) -&gt; list:
    embedding = encoder.encode(error).tolist()
    results = bugs.query(query_embeddings=[embedding], n_results=k)
    return [json.loads(d) for d in results["documents"][0]]
</code></pre>
<hr />
<h3>Decision guide</h3>
<table>
<thead>
<tr>
<th>Situation</th>
<th>Recommended approach</th>
</tr>
</thead>
<tbody><tr>
<td>Greenfield, any size team</td>
<td>Start with Markdown files</td>
</tr>
<tr>
<td>Grep getting slow (&gt;300 bugs)</td>
<td>Add SQLite FTS5 index</td>
</tr>
<tr>
<td>Agent writing frequently</td>
<td>JSONL as write target + Markdown for humans</td>
</tr>
<tr>
<td>Semantic misses (&gt;1000 bugs)</td>
<td>Add Vector DB on top of Markdown</td>
</tr>
<tr>
<td>Migrating tiers later</td>
<td>Markdown is always the source of truth</td>
</tr>
</tbody></table>
<blockquote>
<p><strong>Don't start with a vector DB.</strong> The bottleneck on a young codebase is knowledge capture, not retrieval speed. A <code>.bugs/</code> folder with 50 markdown files and a grep retriever will outperform an over-engineered vector store day one.</p>
</blockquote>
<hr />
<h2>7. Real-World Examples from Production</h2>
<h3>Anthropic — Claude Code: the three-agent harness</h3>
<p>Anthropic uses a multi-agent harness for long-running coding tasks. One agent plans, one generates code, and one evaluates quality. Context resets are paired with structured handoff artifacts — so the next agent starts from a known, clean state. This solved the classic problem of context drift over multi-hour sessions.</p>
<h3>Manus: 5 harness rewrites, same model, 5× better reliability</h3>
<p>Manus rewrote their harness architecture five times in six months. The underlying model didn't change. Each rewrite improved task completion rates purely through better structure: smarter context handling, tighter tool definitions, and cleaner sub-agent coordination. The model was never the bottleneck.</p>
<h3>Microsoft — Azure SRE Agent: 40.5 hours to 3 minutes</h3>
<p>Microsoft's SRE agent harness wires MCP tools, telemetry, code repos, and incident management into a single pipeline. "Intent Met" score rose from 45% to 75% on novel incidents after shifting from bespoke tooling to a file-based context system. The system has handled 35,000+ production incidents autonomously.</p>
<h3>Vercel: subtraction as harness improvement</h3>
<p>Vercel's team removed 80% of their agent's available tools. The result: fewer steps, fewer tokens, faster responses, and higher task success rate. Right-sizing the toolset is a harness decision, not a model decision.</p>
<hr />
<h2>8. Harness Engineering as a Discipline</h2>
<p>Harness engineering is now a standalone discipline — distinct from MLOps and DevOps, though it borrows from both.</p>
<ul>
<li><p><strong>MLOps</strong> — model performance over time (training, deployment, retraining)</p>
</li>
<li><p><strong>DevOps</strong> — software delivery pipelines (CI/CD, infrastructure)</p>
</li>
<li><p><strong>Harness engineering</strong> — agent behavior in real-time execution, right now, on this task</p>
</li>
</ul>
<p>Key tools as of early 2026:</p>
<pre><code class="language-plaintext">Claude Agent SDK   → general-purpose harness, built-in context mgmt
CrewAI Flows       → event-driven multi-agent orchestration
LangChain          → composable harness primitives
AutoHarness        → automated harness engineering (6-step governance)
AutoAgent          → meta-agent that writes and optimizes its own harness
</code></pre>
<blockquote>
<p><strong>Emerging role:</strong> "Harness engineer" is entering job descriptions at companies building agent-powered products. The skillset combines software engineering with AI-specific knowledge of context management, prompt design, and agent evaluation.</p>
</blockquote>
<hr />
<h2>9. Why the Harness Is the Moat, Not the Model</h2>
<p>The model is becoming a commodity. Claude, GPT, Gemini — on static benchmarks, the gap is shrinking fast. The real differentiation is now infrastructure.</p>
<table>
<thead>
<tr>
<th>Metric</th>
<th>Result</th>
</tr>
</thead>
<tbody><tr>
<td>Manus harness rewrites, same model</td>
<td>5× reliability improvement</td>
</tr>
<tr>
<td>Tools Vercel removed to improve reliability</td>
<td>80%</td>
</tr>
<tr>
<td>Microsoft SRE "Intent Met" score improvement</td>
<td>+30%</td>
</tr>
<tr>
<td>Benchmark swing from harness setup alone</td>
<td>5+ points</td>
</tr>
</tbody></table>
<p>All of these came from changing the harness — not the model. You can fine-tune a competitive model in weeks. Building production-ready harnesses takes months or years. That's the moat.</p>
<hr />
<h2>10. The Future: Self-Optimizing Harnesses</h2>
<p>AutoAgent (April 2026) lets a meta-agent build and iterate on a harness autonomously overnight — modifying the system prompt, tools, and orchestration, running benchmarks, and keeping only changes that improve scores. In a 24-hour run, it hit #1 on SpreadsheetBench (96.5%) and top score on TerminalBench (55.1%) — beating every hand-engineered entry.</p>
<p>The human's job shifted from "engineer who edits <code>agent.py</code>" to "director who writes <code>program.md</code>."</p>
<p>Looking ahead: harnesses will become the primary tool for solving model drift — detecting exactly when a model stops reasoning correctly after its 100th step, feeding that data back into training. We're heading toward a convergence of training and inference environments, and the harness is at the center of that shift.</p>
<hr />
<h2>11. Conclusion</h2>
<p>2025 proved agents could work. 2026 is about making them work reliably at scale. The model is a component. The harness is the system.</p>
<p>Constrain what agents can do. Inform them about what they should do. Verify their work. Correct their mistakes. Keep humans in the loop at high-stakes decisions. This is harness engineering — and it's the most important infrastructure skill for developers building AI products right now.</p>
<p>The engine matters. But the car is what wins races.</p>
<hr />
<p>[Thoughts by Anish, rephrased by Claude]</p>
<p><em>Tags: agent-harness · AI infrastructure · LLM · harness-engineering · claude-code · multi-agent · developer</em></p>
]]></content:encoded></item><item><title><![CDATA[Consistent Hashing: Explained with Implementation Steps]]></title><description><![CDATA[In distributed systems, managing data placement and load balancing efficiently is crucial. One powerful tool for addressing these challenges is consistent hashing. This blog will explore consistent hashing, provide an example, and discuss why it is s...]]></description><link>https://anishratnawat.com/consistent-hashing-explained-with-implementation-steps</link><guid isPermaLink="true">https://anishratnawat.com/consistent-hashing-explained-with-implementation-steps</guid><category><![CDATA[consistent hashing]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Fri, 07 Mar 2025 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1736843099451/a023c638-61b2-4202-9217-fa8b55159142.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In distributed systems, managing data placement and load balancing efficiently is crucial. One powerful tool for addressing these challenges is <strong>consistent hashing</strong>. This blog will explore consistent hashing, provide an example, and discuss why it is superior to other approaches in certain scenarios.</p>
<h1 id="heading-what-is-consistent-hashing">What is Consistent Hashing?</h1>
<p>To understand consistent hashing, it is helpful to first examine <strong>traditional hashing</strong> and its limitations.</p>
<h2 id="heading-traditional-hashing">Traditional Hashing:</h2>
<p>In traditional hashing, a <strong>hash function</strong> maps keys directly to buckets (nodes).</p>
<p>Hash Functions are any functions that map value from an arbitrarily sized domain to another fixed-sized domain, usually called the Hash Space. The values generated as an output of these hash functions are typically used as keys to enable efficient lookups of the original entity.</p>
<p><strong>For example:</strong></p>
<ul>
<li><ul>
<li><p>Suppose you have a <strong>Distributed File Store system</strong> where users can <strong><em>upload</em></strong> and <strong><em>read files</em></strong>. The files are stored across <strong><em>4 servers</em></strong>, and the hash function assigns files to servers using the formula <code>hash(fileName) % number_of_servers</code>.</p>
<ul>
<li>If the number of servers is 4, and <code>hash(fileName)</code> returns 9 for a specific file (e.g., "image.png"), it will be stored in server <code>9 % 4 = 1</code>. Similarly, a request to read "image.png" will also be routed to server 1 using the same logic.</li>
</ul>
</li>
</ul>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736840095751/e262b09d-34e7-41ad-be3c-7539677827fb.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-limitations-of-traditional-hashing">Limitations of Traditional Hashing:</h3>
<ol>
<li><p><strong>High Disruption with Node Changes:</strong></p>
<ul>
<li><p>If a new server is added or an existing server is removed, almost all keys need to be rehashed and redistributed.</p>
<ul>
<li>For example, consider a <strong>Distributed File Store system</strong> where users upload and read files, and files are distributed across 4 servers using <code>hash(fileName) % 4</code>. If a new server is added (making it 5 servers), the formula changes to <code>hash(fileName) % 5</code>. As a result, files previously mapped to a specific server will now likely be assigned to different servers. For instance, a file that was on Server 3 with <code>hash(fileName) % 4 = 3</code> might now be moved to Server 4 with <code>hash(fileName) % 5 = 4</code>.</li>
</ul>
</li>
<li><p>This results in significant overhead and potential performance degradation.</p>
</li>
</ul>
</li>
<li><p><strong>Load Imbalance:</strong></p>
<ul>
<li>If the hash function does not distribute keys evenly, some servers may become overloaded while others remain underutilized.</li>
</ul>
</li>
<li><p><strong>Scalability Issues:</strong></p>
<ul>
<li>Scaling up or down in response to load is not seamless due to the need for global rehashing.</li>
</ul>
</li>
</ol>
<h2 id="heading-how-consistent-hashing-is-better">How Consistent Hashing is Better:</h2>
<p>Consistent hashing addresses these issues by using a different approach. Instead of directly mapping keys to nodes, both <strong>keys and nodes are placed on a virtual ring</strong>. Keys are assigned to the nearest node in the clockwise direction.</p>
<p>When the nodes are added to the virtual ring, only the keys mapped to the adjacent nodes will be remapped. Similarly, when nodes are removed from the virtual ring, only the keys of the removed node will be remapped. This approach ensures that when a node is added or removed, only a subset of keys needs to be remapped, making the system more resilient to changes.</p>
<p><strong>The Key Idea:</strong></p>
<ul>
<li><p>The hash space is visualised as a circle (0 to 2^m - 1, where <em>m</em> is the number of bits in the hash).</p>
</li>
<li><p>Each node (e.g., server) is assigned a position on the circle using a hash function.</p>
</li>
<li><p>Each key is also assigned a position on the circle.</p>
</li>
<li><p>A key is assigned to the first node clockwise from its position.</p>
</li>
</ul>
<h3 id="heading-example">Example</h3>
<p>Suppose we have three servers (A, B, and C) in a distributed file storage system, where users upload and read files, and we use consistent hashing to distribute files.<br />We created 2 virtual nodes of each server so that load will distribute among them evenly and reducing the possibility of cascading failure.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736841766101/b8414e34-5198-4b09-a506-a68652a45df0.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p><strong>Step 1: Assign Nodes to the Ring</strong></p>
<ul>
<li><p>Server N1a,N2b,N3c is hashed to position 10, 60 and 80.</p>
</li>
<li><p>Server N2a, N2b, N3c is hashed to position 30,90 and 110.</p>
</li>
<li><p>Server N3a,N3b,N3c is hashed to position 120, 20 and 50.</p>
</li>
<li><p>Server N4a,N4b,N4c is hashed to position 40, 70 and 100.</p>
</li>
</ul>
</li>
<li><p><strong>Step 2: Map Files to the Ring</strong></p>
<ul>
<li><p>File F1 ("image1.png") is hashed to position 5.</p>
</li>
<li><p>File F2 ("doc1.pdf") is hashed to position 25.</p>
</li>
<li><p>File F3 ("video1.mp4") is hashed to position 70.</p>
</li>
</ul>
</li>
<li><p><strong>Step 3: Place Files</strong></p>
<ul>
<li><p>F1 (position 5) is assigned to Server N1a (first node clockwise).</p>
</li>
<li><p>F2 (position 25) is assigned to Server N2a.</p>
</li>
<li><p>F3 (position 70) is assigned to Server N4b.</p>
</li>
</ul>
</li>
</ol>
<p><strong>Adding a new Node</strong></p>
<p>Suppose a new server, N5, is added and hashed to position 27.</p>
<p>Only one file (F2 position 25) is reassigned to N5, illustrating minimal disruption.</p>
<p><strong>Removing a Node</strong></p>
<p>Suppose a Server N4 is down and removed. Keys mapped to Server N4 will be remapped.<br />Only File F3(position 70) will be reassigned to Server N1c.</p>
<hr />
<h3 id="heading-implementation-details">Implementation Details</h3>
<p>Here is a basic Java implementation of consistent hashing:</p>
<pre><code class="lang-java"><span class="hljs-keyword">import</span> java.security.MessageDigest;
<span class="hljs-keyword">import</span> java.security.NoSuchAlgorithmException;
<span class="hljs-keyword">import</span> java.util.*;

<span class="hljs-keyword">public</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ConsistentHashing</span> </span>{
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> <span class="hljs-keyword">int</span> numReplicas;
    <span class="hljs-keyword">private</span> <span class="hljs-keyword">final</span> SortedMap&lt;Integer, String&gt; ring;

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-title">ConsistentHashing</span><span class="hljs-params">(<span class="hljs-keyword">int</span> numReplicas)</span> </span>{
        <span class="hljs-keyword">this</span>.numReplicas = numReplicas;
        <span class="hljs-keyword">this</span>.ring = <span class="hljs-keyword">new</span> TreeMap&lt;&gt;();
    }

    <span class="hljs-comment">// Hash function - MD5</span>
    <span class="hljs-function"><span class="hljs-keyword">private</span> <span class="hljs-keyword">int</span> <span class="hljs-title">hash</span><span class="hljs-params">(String key)</span> </span>{
        <span class="hljs-keyword">try</span> {
            MessageDigest md = MessageDigest.getInstance(<span class="hljs-string">"MD5"</span>);
            <span class="hljs-keyword">byte</span>[] digest = md.digest(key.getBytes());
            <span class="hljs-keyword">return</span> ((digest[<span class="hljs-number">0</span>] &amp; <span class="hljs-number">0xFF</span>) &lt;&lt; <span class="hljs-number">24</span>) | ((digest[<span class="hljs-number">1</span>] &amp; <span class="hljs-number">0xFF</span>) &lt;&lt; <span class="hljs-number">16</span>) | ((digest[<span class="hljs-number">2</span>] &amp; <span class="hljs-number">0xFF</span>) &lt;&lt; <span class="hljs-number">8</span>) | (digest[<span class="hljs-number">3</span>] &amp; <span class="hljs-number">0xFF</span>);
        } <span class="hljs-keyword">catch</span> (NoSuchAlgorithmException e) {
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> RuntimeException(e);
        }
    }

    <span class="hljs-comment">// Adding new node in the ring</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">addNode</span><span class="hljs-params">(String node)</span> </span>{
        <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = <span class="hljs-number">0</span>; i &lt; numReplicas; i++) {
            String replicaKey = node + <span class="hljs-string">":"</span> + i;
            ring.put(hash(replicaKey), node);
        }
    }

    <span class="hljs-comment">// Removing node from the ring</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">removeNode</span><span class="hljs-params">(String node)</span> </span>{
        <span class="hljs-keyword">for</span> (<span class="hljs-keyword">int</span> i = <span class="hljs-number">0</span>; i &lt; numReplicas; i++) {
            String replicaKey = node + <span class="hljs-string">":"</span> + i;
            ring.remove(hash(replicaKey));
        }
    }

    <span class="hljs-comment">// Get the node/server to map the given key/value</span>
    <span class="hljs-function"><span class="hljs-keyword">public</span> String <span class="hljs-title">getNode</span><span class="hljs-params">(String key)</span> </span>{
        <span class="hljs-keyword">if</span> (ring.isEmpty()) {
            <span class="hljs-keyword">return</span> <span class="hljs-keyword">null</span>;
        }
        <span class="hljs-keyword">int</span> hashKey = hash(key);
        <span class="hljs-keyword">if</span> (!ring.containsKey(hashKey)) {
            SortedMap&lt;Integer, String&gt; tailMap = ring.tailMap(hashKey);
            hashKey = tailMap.isEmpty() ? ring.firstKey() : tailMap.firstKey();
        }
        <span class="hljs-keyword">return</span> ring.get(hashKey);
    }

    <span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">void</span> <span class="hljs-title">main</span><span class="hljs-params">(String[] args)</span> </span>{
        ConsistentHashing ch = <span class="hljs-keyword">new</span> ConsistentHashing(<span class="hljs-number">3</span>);
        ch.addNode(<span class="hljs-string">"A"</span>);
        ch.addNode(<span class="hljs-string">"B"</span>);
        ch.addNode(<span class="hljs-string">"C"</span>);

        System.out.println(ch.getNode(<span class="hljs-string">"K1"</span>)); <span class="hljs-comment">// Node responsible for K1</span>
        System.out.println(ch.getNode(<span class="hljs-string">"K2"</span>)); <span class="hljs-comment">// Node responsible for K2</span>

        ch.addNode(<span class="hljs-string">"D"</span>); <span class="hljs-comment">// Add a new node</span>
        System.out.println(ch.getNode(<span class="hljs-string">"K2"</span>)); <span class="hljs-comment">// Node responsible for K2 after adding D</span>
    }
}
</code></pre>
<hr />
<h3 id="heading-benefits-of-consistent-hashing">Benefits of Consistent Hashing</h3>
<ol>
<li><p><strong>Minimal Key Movement:</strong></p>
<ul>
<li>When a node joins or leaves, only a small portion of keys are remapped. This is in contrast to traditional hashing, where all keys might need to be redistributed.</li>
</ul>
</li>
<li><p><strong>Load Balancing:</strong></p>
<ul>
<li>Keys are distributed across nodes more evenly, especially when using techniques like <strong>virtual nodes</strong> (assigning multiple positions for each physical node on the ring).</li>
</ul>
</li>
<li><p><strong>Scalability:</strong></p>
<ul>
<li>Adding or removing nodes is seamless, making consistent hashing ideal for systems with dynamic scaling requirements, such as cloud-based applications.</li>
</ul>
</li>
<li><p><strong>Fault Tolerance:</strong></p>
<ul>
<li>When a node fails, its keys are redistributed to adjacent nodes on the ring, ensuring system continuity.</li>
</ul>
</li>
</ol>
<hr />
<h3 id="heading-comparison-with-other-hashing-techniques">Comparison with Other Hashing Techniques</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>Traditional Hashing</td><td>Consistent Hashing</td></tr>
</thead>
<tbody>
<tr>
<td>Key Movement on Changes</td><td>High (many keys remapped)</td><td>Low (few keys remapped)</td></tr>
<tr>
<td>Scalability</td><td>Poor (requires full rehash)</td><td>Excellent</td></tr>
<tr>
<td>Load Balancing</td><td>Depends on hash function</td><td>Enhanced with virtual nodes</td></tr>
<tr>
<td>Resilience to Failures</td><td>Limited</td><td>High</td></tr>
</tbody>
</table>
</div><hr />
<h3 id="heading-applications-of-consistent-hashing">Applications of Consistent Hashing</h3>
<ol>
<li><p><strong>Distributed Caching:</strong></p>
<ul>
<li>Systems like Memcached and Redis use consistent hashing to distribute keys across nodes.</li>
</ul>
</li>
<li><p><strong>Load Balancers:</strong></p>
<ul>
<li>Consistent hashing helps in assigning incoming requests to servers in web applications.</li>
</ul>
</li>
<li><p><strong>Distributed Databases:</strong></p>
<ul>
<li>Databases like Cassandra and DynamoDB leverage consistent hashing for data partitioning and replication.</li>
</ul>
</li>
</ol>
<hr />
<h3 id="heading-conclusion">Conclusion</h3>
<p>Consistent hashing is a cornerstone of modern distributed systems, enabling efficient and resilient data placement. Its ability to handle dynamic changes with minimal disruption makes it a go-to strategy for scalable and fault-tolerant applications.</p>
<p>Whether you’re building a distributed cache, a load balancer, or a database, understanding and implementing consistent hashing can significantly enhance your system's performance and reliability.</p>
]]></content:encoded></item><item><title><![CDATA[Exploring Retrieval Augmented Generation (RAG) with Vector Databases and AI Agents]]></title><description><![CDATA[One of the recent breakthroughs is Retrieval Augmented Generation (RAG). This concept blends the power of generative models with external retrieval systems to enhance the quality and accuracy of responses. When coupled with vector databases and AI ag...]]></description><link>https://anishratnawat.com/exploring-retrieval-augmented-generation-rag-with-vector-databases-and-ai-agents</link><guid isPermaLink="true">https://anishratnawat.com/exploring-retrieval-augmented-generation-rag-with-vector-databases-and-ai-agents</guid><category><![CDATA[RAG ]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Mon, 03 Feb 2025 17:40:42 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1738604686406/1541e28b-0c0a-4244-9357-2822842d4866.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>One of the recent breakthroughs is <strong>Retrieval Augmented Generation (RAG)</strong>. This concept blends the power of generative models with external retrieval systems to enhance the quality and accuracy of responses. When coupled with <strong>vector databases</strong> and <strong>AI agents</strong>, RAG creates a highly dynamic and intelligent system capable of delivering more contextually relevant and fact-based outputs. In this blog, we will dive into how RAG works, the role of vector databases, and how AI agents enhance this process.</p>
<h2 id="heading-what-is-retrieval-augmented-generation-rag">What is Retrieval Augmented Generation (RAG)?</h2>
<p>Retrieval Augmented Generation (RAG) is an approach where a generative model doesn't rely solely on its learned parameters but instead enhances its output by retrieving information from a large corpus of data. The retrieval process ensures that the AI system has access to the most up-to-date and accurate information during its response generation.</p>
<p>Traditional models often struggle with answering questions that require factual knowledge not seen during training, leading to hallucinations or incorrect answers. RAG improves upon this by using an external database to retrieve relevant information and passing that information to the generative model for more context-aware and grounded responses.</p>
<h2 id="heading-how-does-rag-work">How Does RAG Work?</h2>
<ol>
<li><p><strong>Query Input</strong>: A user inputs a query, much like any other question or request posed to a system.</p>
</li>
<li><p><strong>Retrieval</strong>: The system first searches an external source of knowledge (like a vector database) for documents, texts, or passages related to the query.</p>
</li>
<li><p><strong>Document Ranking</strong>: Using techniques like <strong>semantic search</strong> or <strong>nearest neighbor search</strong>, relevant documents are ranked and selected based on how similar they are to the query.</p>
</li>
<li><p><strong>Generation</strong>: The retrieved documents are then passed as context to a generative model, like GPT-3 or GPT-4. This model uses the retrieved information along with its internal knowledge to generate a well-informed, accurate response.</p>
</li>
<li><p><strong>Response Output</strong>: The generative model creates a response that incorporates the retrieved information, ensuring it is grounded in facts and highly relevant to the user's query.</p>
</li>
</ol>
<h2 id="heading-the-role-of-vector-databases-in-rag">The Role of Vector Databases in RAG</h2>
<p>Vector databases play a critical role in the retrieval process of RAG. These databases store embeddings (dense vector representations) of large datasets, which makes it easy to perform efficient similarity searches.</p>
<p>When a query is inputted, it is transformed into a vector through a process known as <strong>embedding</strong>. This vector is then compared against the vectors stored in the database to find the most relevant documents. Vector databases are optimized for this task, offering high-performance similarity search capabilities. Some popular vector databases include:</p>
<ul>
<li><p><strong>FAISS (Facebook AI Similarity Search)</strong>: An open-source library that allows fast similarity search in high-dimensional spaces.</p>
</li>
<li><p><strong>Pinecone</strong>: A managed vector database service that offers scalable similarity search.</p>
</li>
<li><p><strong>Weaviate</strong>: An open-source vector search engine that can integrate with various machine learning models.</p>
</li>
</ul>
<p>These vector databases help ensure that RAG can retrieve the most relevant documents in real-time, even from massive data corpora.</p>
<h3 id="heading-vector-representation-of-texts">Vector Representation of Texts</h3>
<p>To efficiently perform search, text data (such as documents, articles, or websites) must be converted into vectors. This is done using embedding models like <strong>Sentence-BERT</strong> or <strong>OpenAI’s embedding models</strong>. These models convert each piece of text into a vector of fixed dimensionality, which can then be indexed by the vector database. Similarity measures such as <strong>cosine similarity</strong> or <strong>Euclidean distance</strong> are used to rank the retrieved documents.</p>
<h2 id="heading-a-practical-example-of-rag-with-vector-databases-and-ai-agents">A Practical Example of RAG with Vector Databases and AI Agents</h2>
<p>Let’s consider an example of a chatbot built using RAG with a vector database and AI agent:</p>
<h3 id="heading-scenario-an-ai-powered-virtual-assistant-for-technical-support">Scenario: An AI-powered Virtual Assistant for Technical Support</h3>
<p>Imagine you are building a virtual assistant for technical support in a software company. Users will ask questions about the software’s features, installation guides, troubleshooting steps, etc.</p>
<p>Here’s how RAG can be used:</p>
<ol>
<li><p><strong>User Query</strong>: "How do I install the software on Linux?"</p>
</li>
<li><p><strong>AI Agent</strong>: The AI agent processes the query, recognizes that the user is asking about software installation on a Linux system, and formulates a precise query for the vector database: "Linux installation guide for software."</p>
</li>
<li><p><strong>Retrieval</strong>: The vector database retrieves relevant documents, such as installation guides, forums, or knowledge base articles related to Linux installations.</p>
</li>
<li><p><strong>Generation</strong>: The generative model takes these documents and crafts a coherent, step-by-step installation guide tailored to the user’s query.</p>
</li>
<li><p><strong>Response</strong>: The AI agent outputs: "To install the software on Linux, follow these steps... [steps from the retrieved guide]"</p>
</li>
</ol>
<p>The AI agent ensures the process is seamless and context-aware, making it easy for the user to get accurate and relevant answers without needing to sift through long documentation.</p>
<h3 id="heading-implementation">Implementation</h3>
<p>To implement the example you described using LangChain and a vector database, you can follow the steps outlined below. We will break down the process into key components:</p>
<ol>
<li><p><strong>Setting up the vector database</strong> to store and retrieve relevant documents.</p>
</li>
<li><p><strong>Integrating LangChain</strong> to connect the vector database with a language model for retrieval-augmented generation (RAG).</p>
</li>
<li><p><strong>Building the AI Agent</strong> that processes the user query, retrieves relevant data, and generates a response.</p>
</li>
</ol>
<p><strong>Step 1: Setup Vector Database</strong></p>
<p>We will use <strong>FAISS</strong> (Facebook AI Similarity Search) or <strong>Pinecone</strong> as the vector database to store document embeddings. First, we need to create embeddings for your documents and store them in the database.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.embeddings <span class="hljs-keyword">import</span> OpenAIEmbeddings
<span class="hljs-keyword">from</span> langchain.vectorstores <span class="hljs-keyword">import</span> FAISS
<span class="hljs-keyword">import</span> faiss
<span class="hljs-keyword">import</span> os

<span class="hljs-comment"># Initialize OpenAI embeddings model (or any other model)</span>
embeddings = OpenAIEmbeddings()

<span class="hljs-comment"># Initialize the vector store (FAISS in this case)</span>
faiss_index = FAISS.load_local(<span class="hljs-string">"path/to/faiss_index"</span>)  <span class="hljs-comment"># Load or create your FAISS index</span>

<span class="hljs-comment"># Assuming documents are available as a list of strings</span>
documents = [
    <span class="hljs-string">"Linux installation guide for software..."</span>,
    <span class="hljs-string">"How to troubleshoot software on Linux..."</span>,
    <span class="hljs-string">"Windows installation steps for software..."</span>,
    <span class="hljs-comment"># more documents here</span>
]

<span class="hljs-comment"># Create embeddings for documents</span>
doc_embeddings = embeddings.embed_documents(documents)

<span class="hljs-comment"># Store documents in the FAISS index</span>
faiss_index.add(np.array(doc_embeddings).astype(np.float32))  <span class="hljs-comment"># FAISS index requires float32 embeddings</span>
</code></pre>
<p><strong>Step 2: Retrieval-augmented Generation (RAG) Setup</strong></p>
<p>Now we integrate LangChain to allow the agent to retrieve relevant documents from the vector database and generate responses based on that.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.chains <span class="hljs-keyword">import</span> RetrievalQA
<span class="hljs-keyword">from</span> langchain.llms <span class="hljs-keyword">import</span> OpenAI
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> initialize_agent
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> Tool

<span class="hljs-comment"># Create a retrieval chain</span>
retriever = faiss_index.as_retriever(search_kwargs={<span class="hljs-string">"k"</span>: <span class="hljs-number">3</span>})  <span class="hljs-comment"># Retrieve top 3 results</span>

<span class="hljs-comment"># Use OpenAI or any other LLM for generation</span>
llm = OpenAI(temperature=<span class="hljs-number">0.7</span>)

<span class="hljs-comment"># Create a RetrievalQA chain (combines retrieval and generation)</span>
qa_chain = RetrievalQA(combine_docs_chain=llm, retriever=retriever)

<span class="hljs-comment"># Define the tools for the agent (including QA system)</span>
tools = [
    Tool(
        name=<span class="hljs-string">"Technical Support Assistant"</span>,
        func=qa_chain.run,
        description=<span class="hljs-string">"Retrieve technical support documents from the knowledge base."</span>
    )
]

<span class="hljs-comment"># Initialize the agent with the tools and an LLM</span>
agent = initialize_agent(tools, llm, agent_type=<span class="hljs-string">"zero-shot-react-description"</span>, verbose=<span class="hljs-literal">True</span>)
</code></pre>
<p><strong>Step 3: Implement the AI Agent for User Queries</strong></p>
<p>The agent will now be capable of handling user queries related to technical support. When a user asks, for example, "How do I install the software on Linux?", the agent will retrieve relevant documents and generate a response.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Simulating user query</span>
user_query = <span class="hljs-string">"How do I install the software on Linux?"</span>

<span class="hljs-comment"># Pass the query to the agent for processing</span>
response = agent.run(user_query)

<span class="hljs-comment"># Display the AI's response</span>
print(response)
</code></pre>
<p><strong>Final Workflow</strong></p>
<ol>
<li><p><strong>User submits a query</strong>: "How do I install the software on Linux?"</p>
</li>
<li><p><strong>Vector database retrieves relevant documents</strong> based on query embeddings.</p>
</li>
<li><p><strong>LangChain agent</strong> processes the retrieved documents and passes them to the generative model (OpenAI, or any other model you're using).</p>
</li>
<li><p><strong>AI agent generates a response</strong> combining retrieved documents in a coherent way, such as a step-by-step guide on how to install the software on Linux.</p>
</li>
</ol>
<p><strong>Additional Notes:</strong></p>
<ul>
<li><p>You can store documents as embeddings in your vector database using various models, including OpenAI, SentenceTransformers, or other pre-trained models.</p>
</li>
<li><p>Ensure the knowledge base is regularly updated to maintain accuracy and relevance.</p>
</li>
<li><p>The agent can be enhanced with more advanced features like error handling, multi-step reasoning, or including additional tools for different types of queries.</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Exploring AI Agents: Step-by-Step Implementation Insights]]></title><description><![CDATA[Artificial Intelligence (AI) has evolved significantly over the years, with Large Language Models (LLMs) leading the way in natural language understanding and generation. However, a new paradigm is emerging—AI Agents. Unlike traditional LLMs, AI Agen...]]></description><link>https://anishratnawat.com/exploring-ai-agents-step-by-step-implementation-insights</link><guid isPermaLink="true">https://anishratnawat.com/exploring-ai-agents-step-by-step-implementation-insights</guid><category><![CDATA[ai agents]]></category><category><![CDATA[AI Agents Explained]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Fri, 24 Jan 2025 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1738594448058/bb32b0e2-fe2d-41d6-8208-8c6903f5e5e4.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Artificial Intelligence (AI) has evolved significantly over the years, with Large Language Models (LLMs) leading the way in natural language understanding and generation. However, a new paradigm is emerging—AI Agents. Unlike traditional LLMs, AI Agents possess autonomy, memory, and the ability to perform goal-oriented tasks, making them more efficient in real-world applications. In this blog, we will explore what AI Agents are, how they differ from LLMs, how to develop custom AI Agents, and their real-world use cases. Finally, we will walk through an example of an AI Agent designed for customer support in an online ticket booking system.</p>
<hr />
<h3 id="heading-how-ai-agents-differ-from-traditional-llms"><strong>How AI Agents Differ from Traditional LLMs</strong></h3>
<p>While both AI Agents and LLMs leverage natural language processing, they differ in key aspects:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>Traditional LLMs</td><td>AI Agents</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Autonomy</strong></td><td>Passive, responds to prompts</td><td>Active, initiates tasks based on goals</td></tr>
<tr>
<td><strong>Memory</strong></td><td>Stateless, no memory retention</td><td>Stateful, can store and retrieve information</td></tr>
<tr>
<td><strong>Task Execution</strong></td><td>Provides responses without action</td><td>Can execute tasks and interact with external systems</td></tr>
<tr>
<td><strong>Multi-Step Reasoning</strong></td><td>Processes a single query at a time</td><td>Can break complex problems into sub-tasks and complete them</td></tr>
</tbody>
</table>
</div><p>Traditional LLMs require human intervention to drive conversations, whereas AI Agents can operate independently, making decisions and performing tasks dynamically.</p>
<hr />
<h3 id="heading-how-to-develop-custom-ai-agents"><strong>How to Develop Custom AI Agents</strong></h3>
<p>Developing a custom AI Agent involves several key steps:</p>
<ol>
<li><p><strong>Define the Objective:</strong> Identify the purpose of the AI Agent. For example, automating customer service interactions.</p>
</li>
<li><p><strong>Choose a Framework:</strong> Libraries such as LangChain, AutoGen, and OpenAI's Function Calling API can help build AI Agents.</p>
</li>
<li><p><strong>Implement Memory:</strong> Utilize vector databases like Pinecone or Redis to provide persistent memory.</p>
</li>
<li><p><strong>Incorporate Tools &amp; APIs:</strong> Equip the agent with access to databases, APIs, and external tools to complete tasks.</p>
</li>
<li><p><strong>Implement a Decision-Making Process:</strong> Use reinforcement learning or rule-based logic for better decision-making.</p>
</li>
<li><p><strong>Deploy and Monitor:</strong> Deploy the agent to production and continuously optimize its performance.</p>
</li>
</ol>
<h4 id="heading-coding-example-using-langchain"><strong>Coding Example Using LangChain</strong></h4>
<p>Below is a simple example of building a custom AI Agent using LangChain:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.llms <span class="hljs-keyword">import</span> OpenAI
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> initialize_agent, AgentType
<span class="hljs-keyword">from</span> langchain.tools <span class="hljs-keyword">import</span> Tool

<span class="hljs-comment"># Define an LLM instance</span>
llm = OpenAI(model_name=<span class="hljs-string">"gpt-4"</span>)

<span class="hljs-comment"># Define a tool for the agent to use</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fetch_ticket_availability</span>(<span class="hljs-params">query</span>):</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">"Available tickets for your destination: Flight A, Flight B, Flight C"</span>

tool = Tool(
    name=<span class="hljs-string">"TicketAvailability"</span>,
    func=fetch_ticket_availability,
    description=<span class="hljs-string">"Fetch available tickets based on user query"</span>
)

<span class="hljs-comment"># Initialize the agent</span>
agent = initialize_agent(
    tools=[tool],
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=<span class="hljs-literal">True</span>
)

<span class="hljs-comment"># Test the agent</span>
response = agent.run(<span class="hljs-string">"Find me flights from New Delhi to New York for next Monday"</span>)
print(response)
</code></pre>
<p>This example demonstrates how to integrate a LangChain-powered AI Agent with an external tool to fetch flight availability based on user input.</p>
<p><strong>The agent determines whether to call a tool based on the input query</strong> and its internal reasoning process. In LangChain, this is achieved through a combination of:</p>
<p>1. <strong>Tool Descriptions</strong>: Each tool (like `TicketAvailability` in this case) has a description that helps the agent understand when to use it.</p>
<p>2. <strong>LLM Decision-Making</strong>: The agent uses an LLM to analyze the input query and decide if any tool needs to be invoked.</p>
<p>3. <strong>REACT Framework</strong>: LangChain's `ZERO_SHOT_REACT_DESCRIPTION` agent type follows the ReAct (Reasoning + Acting) paradigm, meaning it first reasons about the input, decides on an action, and then executes the appropriate tool.</p>
<p>4. <strong>Execution Flow:</strong></p>
<p>- The agent receives a user query.</p>
<p>- It parses the intent (e.g., finding flights).</p>
<p>- If the query matches the function of a registered tool (e.g., fetching ticket availability), the agent calls that tool.</p>
<p>- The tool executes its function and returns a response.</p>
<p>- The agent processes the response and provides a final answer to the user.</p>
<p>Thus, when the user asks, *"Find me flights from New Delhi to New York for next Monday"*, the agent:</p>
<p>- Recognizes that the query is related to flight availability.</p>
<p>- Identifies `TicketAvailability` as a relevant tool.</p>
<p>- Calls the function `fetch_ticket_availability()`, retrieves the results, and returns them to the user.</p>
<hr />
<h3 id="heading-use-cases-of-ai-agents"><strong>Use Cases of AI Agents</strong></h3>
<p>AI Agents can be applied in multiple domains, including:</p>
<ul>
<li><p><strong>Customer Support:</strong> Handling queries, resolving complaints, and managing bookings.</p>
</li>
<li><p><strong>Healthcare:</strong> Assisting with medical diagnoses and patient follow-ups.</p>
</li>
<li><p><strong>Finance:</strong> Providing investment recommendations and fraud detection.</p>
</li>
<li><p><strong>E-commerce:</strong> Offering personalized shopping assistance and order tracking.</p>
</li>
<li><p><strong>Software Development:</strong> Automating bug detection and generating code snippets.</p>
</li>
</ul>
<hr />
<h3 id="heading-example-ai-agent-for-customer-support-in-online-ticket-booking"><strong>Example: AI Agent for Customer Support in Online Ticket Booking</strong></h3>
<p>Let’s walk through an example of an AI Agent designed for customer support in an online ticket booking system.</p>
<h4 id="heading-objective"><strong>Objective</strong></h4>
<p>To automate customer queries, assist with ticket bookings, cancellations, and modifications.</p>
<h4 id="heading-architecture"><strong>Architecture</strong></h4>
<ol>
<li><p><strong>LLM for Natural Language Understanding</strong> - GPT-based model for conversational interface.</p>
</li>
<li><p><strong>Memory Store</strong> - A Redis-based database for storing user history.</p>
</li>
<li><p><strong>APIs for Integration</strong> - Connecting with ticket booking systems (e.g., airline, train, event platforms).</p>
</li>
<li><p><strong>Decision Engine</strong> - Rule-based or reinforcement learning model for handling customer queries.</p>
</li>
</ol>
<h4 id="heading-workflow"><strong>Workflow</strong></h4>
<ol>
<li><p><strong>User Query:</strong> "I want to book a flight from New Delhi to New York for next Monday."</p>
</li>
<li><p><strong>Intent Recognition:</strong> The agent extracts key details: origin (New Delhi), destination (New York), date (next Monday).</p>
</li>
<li><p><strong>API Call:</strong> The agent fetches available flights and presents options.</p>
</li>
<li><p><strong>User Confirmation:</strong> The user selects a preferred flight.</p>
</li>
<li><p><strong>Booking Completion:</strong> The agent books the flight and provides a confirmation.</p>
</li>
<li><p><strong>Follow-up:</strong> If needed, the agent can assist with cancellations, seat selection, or meal preferences.</p>
</li>
</ol>
<hr />
<h3 id="heading-sample-implementation-of-ai-agent-for-customer-support-based-on-above">Sample Implementation of <strong>AI Agent for Customer Support</strong> based on above</h3>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain.chat_models <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">from</span> langchain.memory <span class="hljs-keyword">import</span> RedisChatMessageHistory
<span class="hljs-keyword">from</span> langchain.agents <span class="hljs-keyword">import</span> initialize_agent, Tool
<span class="hljs-keyword">from</span> langchain.tools <span class="hljs-keyword">import</span> tool
<span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">import</span> datetime

<span class="hljs-comment"># Define a function to fetch available flights</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fetch_flights</span>(<span class="hljs-params">origin, destination, date</span>):</span>
    <span class="hljs-comment"># Placeholder function: Replace with actual API calls to airline or travel service</span>
    <span class="hljs-keyword">return</span> [
        {<span class="hljs-string">"flight"</span>: <span class="hljs-string">"AI 101"</span>, <span class="hljs-string">"departure"</span>: <span class="hljs-string">"10:00 AM"</span>, <span class="hljs-string">"arrival"</span>: <span class="hljs-string">"2:00 PM"</span>, <span class="hljs-string">"price"</span>: <span class="hljs-string">"$500"</span>},
        {<span class="hljs-string">"flight"</span>: <span class="hljs-string">"UA 202"</span>, <span class="hljs-string">"departure"</span>: <span class="hljs-string">"1:00 PM"</span>, <span class="hljs-string">"arrival"</span>: <span class="hljs-string">"5:00 PM"</span>, <span class="hljs-string">"price"</span>: <span class="hljs-string">"$550"</span>},
    ]

<span class="hljs-comment"># Define a function to book flights</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">book_flight</span>(<span class="hljs-params">flight_id, user_details</span>):</span>
    <span class="hljs-comment"># Placeholder function: Replace with actual API call to book the flight</span>
    <span class="hljs-keyword">return</span> {<span class="hljs-string">"status"</span>: <span class="hljs-string">"confirmed"</span>, <span class="hljs-string">"flight_id"</span>: flight_id, <span class="hljs-string">"user"</span>: user_details}

<span class="hljs-comment"># LangChain Tool for fetching flights</span>
<span class="hljs-meta">@tool</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_flights</span>(<span class="hljs-params">origin: str, destination: str, date: str</span>):</span>
    <span class="hljs-string">"""Fetches available flights given origin, destination, and date."""</span>
    flights = fetch_flights(origin, destination, date)
    <span class="hljs-keyword">return</span> flights

<span class="hljs-comment"># LangChain Tool for booking flights</span>
<span class="hljs-meta">@tool</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">book_flight_tool</span>(<span class="hljs-params">flight_id: str = None, user_details: dict = None</span>):</span>
    <span class="hljs-string">"""Books a flight given flight ID and user details. If missing, prompts user for input."""</span>

    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> flight_id:
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Please provide a flight ID from the available options."</span>

    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> user_details <span class="hljs-keyword">or</span> <span class="hljs-string">"name"</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> user_details <span class="hljs-keyword">or</span> <span class="hljs-string">"email"</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> user_details:
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Please provide user details including name and email."</span>

    booking = book_flight(flight_id, user_details)
    <span class="hljs-keyword">return</span> booking


<span class="hljs-comment"># Memory store (Redis)</span>
memory = RedisChatMessageHistory(url=<span class="hljs-string">"redis://localhost:6379/0"</span>)

<span class="hljs-comment"># Initialize LLM</span>
llm = ChatOpenAI(model_name=<span class="hljs-string">"gpt-4"</span>)

<span class="hljs-comment"># Define the agent</span>
tools = [get_flights, book_flight_tool]
agent = initialize_agent(
    tools, llm, agent=<span class="hljs-string">"zero-shot-react-description"</span>, verbose=<span class="hljs-literal">True</span>, memory=memory
)

<span class="hljs-comment"># Example query</span>
response = agent.run(<span class="hljs-string">"I want to book a flight from New Delhi to New York for next Monday."</span>)
print(response)
</code></pre>
<p><strong>How AI Agents handle memory</strong></p>
<p>The agent writes conversation history into Redis using the <code>RedisChatMessageHistory</code> memory store. Specifically, it stores messages exchanged between the user and the agent, allowing the AI to maintain context across interactions.</p>
<p><strong>What Gets Stored in Redis?</strong></p>
<ol>
<li><p><strong>User Messages:</strong> The queries or requests made by the user (e.g., <em>"I want to book a flight from New Delhi to New York for next Monday."</em>).</p>
</li>
<li><p><strong>Agent Responses:</strong> The replies generated by the AI (e.g., <em>"Here are the available flights for your route."</em>).</p>
</li>
<li><p><strong>Contextual Memory:</strong> If the user continues the conversation (e.g., <em>"Book the first one."</em>), the agent remembers the previous flight options presented.</p>
</li>
</ol>
<p><strong>How Redis Stores the Data?</strong></p>
<ul>
<li><p>The <code>RedisChatMessageHistory</code> class stores messages as a key-value structure in Redis.</p>
</li>
<li><p>Each user session is typically associated with a unique key (e.g., <code>chat:&lt;session_id&gt;</code>).</p>
</li>
<li><p>Messages are stored in chronological order, allowing retrieval for context-based responses.</p>
</li>
</ul>
<p><strong>Example of Stored Data in Redis</strong></p>
<pre><code class="lang-json">{<span class="hljs-attr">"chat:session_123"</span>: [
        {<span class="hljs-attr">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-attr">"message"</span>: <span class="hljs-string">"I want to book a flight from New Delhi to New York for next Monday."</span>},
        {<span class="hljs-attr">"role"</span>: <span class="hljs-string">"agent"</span>, <span class="hljs-attr">"message"</span>: <span class="hljs-string">"Here are the available flights: AI 101 - $500, UA 202 - $550."</span>},
        {<span class="hljs-attr">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-attr">"message"</span>: <span class="hljs-string">"Book AI 101."</span>},
        {<span class="hljs-attr">"role"</span>: <span class="hljs-string">"agent"</span>, <span class="hljs-attr">"message"</span>: <span class="hljs-string">"Your booking for AI 101 is confirmed."</span>}
    ]
}
</code></pre>
<p>The agent utilizes the message history stored in Redis to maintain context and continuity in the conversation. Here’s how it works:</p>
<hr />
<p><strong>1. Retrieving Conversation History</strong></p>
<p>The <code>RedisChatMessageHistory</code> memory store acts as a persistent message history. Each time the user interacts with the agent, it retrieves past interactions from Redis, allowing it to remember the conversation.</p>
<ul>
<li><p>When a user starts a new session, Redis retrieves previous messages using a unique session key (e.g., <code>chat:&lt;session_id&gt;</code>).</p>
</li>
<li><p>The LangChain memory module feeds this history into the LLM, enabling it to generate responses based on past exchanges.</p>
</li>
</ul>
<hr />
<p><strong>2. Contextual Understanding</strong></p>
<p>Since the agent maintains history, it can:</p>
<p>✅ <strong>Understand Follow-up Queries</strong><br />If a user says:</p>
<ul>
<li><p><strong>User:</strong> <em>"I want to book a flight from New Delhi to New York for next Monday."</em></p>
</li>
<li><p><strong>Agent:</strong> <em>"Here are the available flights: AI 101 - $500, UA 202 - $550."</em></p>
</li>
<li><p><strong>User:</strong> <em>"Book the first one."</em></p>
</li>
</ul>
<p>The agent remembers "AI 101" as the first option without needing the user to repeat.</p>
<p>✅ <strong>Maintain Personalization</strong><br />If a user previously requested vegetarian meals or window seats, the agent can recall this preference.</p>
<p>✅ <strong>Handle Multi-turn Conversations</strong></p>
<ul>
<li><p><strong>User:</strong> <em>"What’s my booking status?"</em></p>
</li>
<li><p><strong>Agent:</strong> (retrieves previous booking confirmation) <em>"Your flight AI 101 is confirmed."</em></p>
</li>
</ul>
<hr />
<p><strong>3. How LangChain Uses History?</strong></p>
<p>LangChain’s memory mechanism ensures that past interactions are passed as part of the conversation context.</p>
<ul>
<li><p><strong>Example without memory:</strong></p>
<ul>
<li><p>User: <em>"Book the first one."</em></p>
</li>
<li><p>Agent: <em>"I don’t understand. Which flight?"</em></p>
</li>
</ul>
</li>
<li><p><strong>Example with memory:</strong></p>
<ul>
<li><p>User: <em>"Book the first one."</em></p>
</li>
<li><p>Agent: <em>(Remembers previous options)</em> <em>"Your flight AI 101 is confirmed."</em></p>
</li>
</ul>
</li>
</ul>
<hr />
]]></content:encoded></item><item><title><![CDATA[Guide to Choosing the Right Database for Your App]]></title><description><![CDATA[Selecting the appropriate data store for your application is crucial. Here’s an overview of database types and when to choose each for specific use cases:

Relational Databases (RDBMS)

Characteristics:

Organize data into tables with predefined sche...]]></description><link>https://anishratnawat.com/guide-to-choosing-the-right-database-for-your-app</link><guid isPermaLink="true">https://anishratnawat.com/guide-to-choosing-the-right-database-for-your-app</guid><category><![CDATA[chooserightdatabase]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 12 Oct 2024 18:30:00 GMT</pubDate><content:encoded><![CDATA[<p>Selecting the appropriate data store for your application is crucial. Here’s an overview of database types and when to choose each for specific use cases:</p>
<hr />
<h1 id="heading-relational-databases-rdbms"><strong>Relational Databases (RDBMS)</strong></h1>
<ul>
<li><p><strong>Characteristics:</strong></p>
<ul>
<li><p>Organize data into tables with predefined schemas.</p>
</li>
<li><p>Support strong ACID (Atomicity, Consistency, Isolation, Durability) properties.</p>
</li>
<li><p>Use SQL for querying.</p>
</li>
<li><p>Best for applications needing complex queries, transactions, and structured data.</p>
</li>
<li><p>Sharding is possible in it but not well supported (no built in )</p>
<ul>
<li>SQl guarantee consistency but wait all shard to agree on transaction can be costly.</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Databases:</strong></p>
<ul>
<li>Microsoft SQL Server, PostgreSQL, MySQL, Oracle Database.</li>
</ul>
</li>
<li><p><strong>When to Choose:</strong></p>
<ul>
<li><p>Applications with <strong>structured data</strong></p>
</li>
<li><p>Applications with <strong>fixed/strict schemas.</strong></p>
</li>
<li><p>Scenarios needing complex relationships between entities (e.g., e-commerce, banking) or <strong>complex joins</strong>.</p>
</li>
<li><p>Use cases requiring <strong>strict consistency and ACID transaction</strong> support.</p>
</li>
</ul>
</li>
<li><p><strong>Examples</strong></p>
<ul>
<li><p>Inventory/Order/Reporting management</p>
</li>
<li><p>Accounting/ Banking</p>
</li>
</ul>
</li>
</ul>
<hr />
<h1 id="heading-nosql-databases"><strong>NoSQL Databases</strong></h1>
<p><strong>When to Choose:</strong></p>
<ul>
<li><p>Applications with <strong>Semi-Structure data</strong></p>
</li>
<li><p>Applications with <strong>Dynamic schema</strong></p>
</li>
<li><p><strong>Not much need</strong> of <strong>complex joins</strong></p>
</li>
<li><p>Store many <strong>TB's of data</strong> and <strong>highly scalable</strong></p>
</li>
</ul>
<h3 id="heading-key-value-stores"><strong>Key-Value Stores</strong></h3>
<ul>
<li><p><strong>Characteristics:</strong></p>
<ul>
<li><p>Simplest type of NoSQL database.</p>
</li>
<li><p>Data is stored as key-value pairs.</p>
</li>
<li><p>Optimized for fast reads and writes.</p>
</li>
</ul>
</li>
<li><p><strong>Databases:</strong></p>
<ul>
<li>Redis, DynamoDB.</li>
</ul>
</li>
<li><p><strong>When to Choose:</strong></p>
<ul>
<li><p>Data is accessed using a single key, like a dictionary.</p>
</li>
<li><p>No joins, lock, or unions are required.</p>
</li>
<li><p>No aggregation mechanisms are used.</p>
</li>
<li><p>Quick lookup and relationship are minimal.</p>
</li>
</ul>
</li>
<li><p><strong>Example:</strong></p>
<ul>
<li>Caching, session management, and real-time analytics.</li>
</ul>
</li>
</ul>
<h3 id="heading-document-databases"><strong>Document Databases</strong></h3>
<ul>
<li><p><strong>Characteristics:</strong></p>
<ul>
<li><p>Store semi-structured data as JSON or BSON documents.</p>
</li>
<li><p>Allow flexible schemas and hierarchical data.</p>
</li>
</ul>
</li>
<li><p><strong>Databases:</strong></p>
<ul>
<li>MongoDB, Dynamodb, cosmosDB</li>
</ul>
</li>
<li><p><strong>When to Choose:</strong></p>
<ul>
<li><p>Flexible Schemas are required and index varies required on multiple fields</p>
</li>
<li><p>Scenarios where data structure varies across records.</p>
</li>
</ul>
</li>
<li><p><strong>Example:</strong></p>
<ul>
<li>Content management systems, product catalogs, or applications needing flexible schemas.</li>
</ul>
</li>
</ul>
<h3 id="heading-column-family-stores"><strong>Column-Family Stores</strong></h3>
<ul>
<li><p><strong>Characteristics:</strong></p>
<ul>
<li><p>Data is stored in tables but optimized for columnar storage instead of rows. Each column is a part of column family.</p>
</li>
<li><p>Scalable and efficient for write-heavy workloads.</p>
</li>
<li><p>Update and delete operations are rare.</p>
</li>
<li><p>Designed to provide high throughput and low-latency access</p>
</li>
</ul>
</li>
<li><p><strong>Databases:</strong></p>
<ul>
<li>Cassandra, HBase, ScyllaDB.</li>
</ul>
</li>
<li><p><strong>When to Choose:</strong></p>
<ul>
<li><p>Heavy write operations</p>
</li>
<li><p>High throughput and low latency</p>
</li>
</ul>
</li>
<li><p><strong>Example:</strong></p>
<ul>
<li>Recommendations, Personalization, Sensor data, Telemetry, Messaging, Social media analytics, Activity monitoring, Weather and other time-series data</li>
</ul>
</li>
</ul>
<h3 id="heading-graph-databases"><strong>Graph Databases</strong></h3>
<ul>
<li><p><strong>Characteristics:</strong></p>
<ul>
<li><p>Represent data as nodes and edges, ideal for modeling relationships.</p>
</li>
<li><p>Use graph-based querying languages like Gremlin or Cypher.</p>
</li>
</ul>
</li>
<li><p><strong>Examples:</strong></p>
<ul>
<li>Neo4j, Cosmos DB (Gremlin API), Amazon Neptune.</li>
</ul>
</li>
<li><p><strong>When to Choose:</strong></p>
<ul>
<li><p>Social networks, recommendation engines, fraud detection.</p>
</li>
<li><p>Use cases where relationships are critical and highly interconnected.</p>
</li>
</ul>
</li>
</ul>
<h2 id="heading-time-series-databases"><strong>Time-Series Databases</strong></h2>
<ul>
<li><p><strong>Characteristics:</strong></p>
<ul>
<li><p>Designed to handle time-stamped or sequential data.</p>
</li>
<li><p>Optimized for time-series analysis and aggregation.</p>
</li>
</ul>
</li>
<li><p><strong>Examples:</strong></p>
<ul>
<li>InfluxDB, TimescaleDB, OpenTSDB.</li>
</ul>
</li>
<li><p><strong>When to Choose:</strong></p>
<ul>
<li>IoT sensor data, financial transactions, performance monitoring.</li>
</ul>
</li>
</ul>
<h2 id="heading-search-databases"><strong>Search Databases</strong></h2>
<ul>
<li><p><strong>Characteristics:</strong></p>
<ul>
<li><p>Optimized for full-text search and analysis.</p>
</li>
<li><p>Provide advanced query and indexing capabilities for text.</p>
</li>
</ul>
</li>
<li><p><strong>Examples:</strong></p>
<ul>
<li>Elasticsearch, Solr, Azure Cognitive Search.</li>
</ul>
</li>
<li><p><strong>When to Choose:</strong></p>
<ul>
<li><p>Applications needing text search or log analysis.</p>
</li>
<li><p>Use cases like e-commerce product search or log monitoring.</p>
</li>
</ul>
</li>
</ul>
<h2 id="heading-object-storage"><strong>Object Storage</strong></h2>
<ul>
<li><p><strong>Characteristics:</strong></p>
<ul>
<li><p>Designed for storing large, unstructured data (files, images, videos).</p>
</li>
<li><p>Offers flat namespaces and metadata for objects.</p>
</li>
</ul>
</li>
<li><p><strong>Examples:</strong></p>
<ul>
<li>Amazon S3, Azure Blob Storage.</li>
</ul>
</li>
<li><p><strong>When to Choose:</strong></p>
<ul>
<li>Media storage, backups, archival, or big data pipelines.</li>
</ul>
</li>
</ul>
<h2 id="heading-in-memory-databases"><strong>In-Memory Databases</strong></h2>
<ul>
<li><p><strong>Characteristics:</strong></p>
<ul>
<li><p>Data is stored in memory for ultra-fast access.</p>
</li>
<li><p>Often used as caching layers.</p>
</li>
</ul>
</li>
<li><p><strong>Examples:</strong></p>
<ul>
<li>Redis, Memcached.</li>
</ul>
</li>
<li><p><strong>When to Choose:</strong></p>
<ul>
<li>Real-time analytics, session stores, or scenarios requiring low-latency data access.</li>
</ul>
</li>
</ul>
<hr />
<h1 id="heading-choosing-the-right-database"><strong>Choosing the Right Database</strong></h1>
<ol>
<li><p><strong>Workload Type:</strong></p>
<ul>
<li><p>OLTP (Online Transaction Processing): Relational or document databases.</p>
</li>
<li><p>OLAP (Online Analytical Processing): Column-family or time-series databases.</p>
</li>
</ul>
</li>
<li><p><strong>Data Relationships:</strong></p>
<ul>
<li><p>Strong relationships: Relational or graph databases.</p>
</li>
<li><p>Weak or no relationships: NoSQL (key-value, document).</p>
</li>
</ul>
</li>
<li><p><strong>Scalability:</strong></p>
<ul>
<li><p>Horizontal scalability: NoSQL databases.</p>
</li>
<li><p>Vertical scalability: Relational databases.</p>
</li>
</ul>
</li>
<li><p><strong>Consistency vs. Availability:</strong></p>
<ul>
<li><p>Strict consistency: Relational databases.</p>
</li>
<li><p>Eventual consistency: NoSQL databases.</p>
</li>
</ul>
</li>
<li><p><strong>Schema Flexibility:</strong></p>
<ul>
<li><p>Fixed schema: Relational databases.</p>
</li>
<li><p>Flexible schema: Document or key-value stores.</p>
</li>
</ul>
</li>
<li><p><strong>Querying Needs:</strong></p>
<ul>
<li><p>Complex queries: Relational or graph databases.</p>
</li>
<li><p>Simple queries: Key-value or column-family stores.</p>
</li>
</ul>
</li>
</ol>
<p>By analyzing your use case across these dimensions, you can confidently choose the database that best fits your requirements.</p>
<p><strong>Which one to choose when</strong></p>
<ul>
<li><p>Check if ACID is required then sql, otherwise nosql.</p>
<ul>
<li>Replication and sharding can be achieved in both. Sharding is bit difficult to achieve in SQL because no built in support.</li>
</ul>
</li>
<li><p>High availability and compromise with consistency then nosql</p>
</li>
<li><p>Consider below factor while choosing databases:</p>
<ul>
<li>Have <strong>Structured</strong> data ? , Need complex <strong>Joins</strong> ?, Need <strong>Transaction</strong> , <strong>Consistency</strong> level, Need high <strong>Scalability ? (SJTCS)</strong></li>
</ul>
</li>
</ul>
<p><strong>References:</strong></p>
<p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/architecture/guide/technology-choices/data-store-overview">https://learn.microsoft.com/en-us/azure/architecture/guide/technology-choices/data-store-overview</a></p>
]]></content:encoded></item><item><title><![CDATA[Understanding the CAP Theorem]]></title><description><![CDATA[What is CAP Theorem
In the world of distributed systems, the CAP theorem is a fundamental concept that guides the design and architecture of these systems. Proposed by Eric Brewer in 2000, the CAP theorem states that it is impossible for a distribute...]]></description><link>https://anishratnawat.com/understanding-the-cap-theorem</link><guid isPermaLink="true">https://anishratnawat.com/understanding-the-cap-theorem</guid><category><![CDATA[CAP-Theorem]]></category><category><![CDATA[#CAPTheorem ]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 17 Aug 2024 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1736788999264/febbebd1-e892-4730-a400-1c20cf3c9648.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-is-cap-theorem">What is CAP Theorem</h2>
<p>In the world of distributed systems, the CAP theorem is a fundamental concept that guides the design and architecture of these systems. Proposed by Eric Brewer in 2000, the CAP theorem states that it is impossible for a distributed system to simultaneously guarantee all three of the following properties:</p>
<ol>
<li><p><strong>Consistency</strong></p>
</li>
<li><p><strong>Availability</strong></p>
</li>
<li><p><strong>Partition Tolerance</strong></p>
</li>
</ol>
<h3 id="heading-consistency"><strong>Consistency</strong></h3>
<p>Consistency ensures that <strong><em>every read request reflects the most recent write.</em></strong></p>
<p>In other words, <strong><em>all nodes have the same view of the data/state</em></strong> at any given time. When a client queries the system, it always retrieves the latest data.</p>
<h3 id="heading-availability"><strong>Availability</strong></h3>
<p>Availability ensures that <strong><em>every request( read(recent or non-recent) or a write) always receives a response, even if there is a node failure or partition breakdown</em></strong>. This means remains operational providing response to any query.</p>
<h3 id="heading-partition-tolerance"><strong>Partition Tolerance</strong></h3>
<p>Partition tolerance guarantees that the system <strong><em>continues to operate despite any number of communication breakdowns/ network partitions between the nodes</em></strong>. In a distributed environment, <strong><em>network partitions are inevitable</em></strong> due to hardware failures, network congestion, or other issues.</p>
<hr />
<h2 id="heading-deep-dive-into-cap-theorem">Deep Dive into CAP Theorem</h2>
<p>A distributed system always needs to be partition tolerant, we shouldn’t be making a system where a network partition brings down the whole system.<br />So, a distributed system is always built <strong>Partition Tolerant.</strong></p>
<p>So, In simple words, <strong>CAP theorem</strong> means if there is network partition and if you want your system to keep functioning you can provide either <strong>Availability</strong> or <strong>Consistency</strong> and not both.</p>
<h3 id="heading-how-a-distributed-system-breaks-consistency-or-availability"><strong>How a Distributed System breaks Consistency or Availability?</strong></h3>
<p><strong>Scenario 1: Multi-Node system where multi nodes capable of handing read/ write and nodes failure to propagate an update request to other nodes.</strong></p>
<p>Consider a cluster with two nodes, N1 and N2, both capable of handling read and write requests.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1736787874402/139e4b5d-2b2f-4d69-9841-6765aaa9b7de.png" alt class="image--center mx-auto" /></p>
<p>In the diagram above, N1 receives an update request for <code>id=2</code>, modifying the salary from 800 to 1000. However, due to a network partition, N1 cannot propagate this update to N2.</p>
<p>When a read request is directed to N2, the node has two possible responses:</p>
<ol>
<li><p><strong>Respond with its current data</strong> (salary = 800) and later update the data when the network partition is resolved. This approach makes system <strong>available</strong> but <strong>not consistent</strong>.</p>
</li>
<li><p><strong>Return an error</strong>, indicating it does not have the latest data. This ensures <strong>consistency</strong> by avoiding the return of stale data but compromises <strong>availability</strong>.</p>
</li>
</ol>
<p><strong>Scenario 2: Single-leader system for read and write operations</strong></p>
<p>In a single-leader system, all read and write operations come to the leader, while other nodes remain synchronized with the leader and act as standby nodes in case the leader fails.</p>
<p>The challenge arises if the leader becomes disconnected from the cluster or clients cannot connect to it due to a network partition. In such cases, the system cannot process write requests until a new leader is elected, making the system <strong>consistent</strong> but not <strong>available</strong> during the transition.</p>
<p>But if system allows read request from Read replica then system can response even if there is master node failure, which makes system <strong>highly available</strong> but <strong>not consistent</strong> for reads.</p>
<p>A single-leader system that handles both reads and writes from master, <strong>should not</strong> be classified as <strong>highly available</strong>.</p>
<p><strong>RDBMS(MySQL, Oracle, MS SQL Server, etc)</strong></p>
<p>It’s no brainer that all <strong>RDBMS are Consistent</strong> as all reads and writes go to a single node/server.</p>
<p>How about availability? You might say, it is one single server and hence a single point of failure. So, how it’s categorized under Availability?</p>
<p>As I said earlier CAP-Availability is not the same as day to day availability/downtime we talk about. In a single node system, there will not be any network partition hence if the node is up, it will always return success for any read/write operation and hence available.</p>
<p>Thus, RDMS system can be Highly available and Consistent.</p>
<hr />
<h3 id="heading-trade-offs-in-cap-theorem"><strong>Trade-offs in CAP Theorem</strong></h3>
<p>The CAP theorem highlights three trade-off scenarios in distributed systems:</p>
<ol>
<li><p><strong>Consistency and Availability (CA):</strong><br /> Ensures identical data across all nodes and responsiveness to requests. Performance may be compromised during network issues to maintain data accuracy.</p>
</li>
<li><p><strong>Consistency and Partition Tolerance (CP):</strong><br /> Prioritizes data consistency across nodes despite network partitions. The system may become temporarily unavailable to preserve data integrity.</p>
</li>
<li><p><strong>Availability and Partition Tolerance (AP):</strong><br /> Focuses on staying operational during network disruptions. Sacrifices strict consistency, accepting temporary data inconsistencies to ensure accessibility.</p>
</li>
</ol>
<hr />
<h3 id="heading-practical-implications"><strong>Practical Implications</strong></h3>
<p>In real-world applications, the choice between consistency, availability, and partition tolerance depends on the specific use case:</p>
<ul>
<li><p><strong>Financial Systems:</strong> Strong consistency is critical to ensure accurate transactions.</p>
</li>
<li><p><strong>Social Media Platforms:</strong> Prioritize availability, allowing users to interact with slightly stale data.</p>
</li>
<li><p><strong>Global Systems:</strong> Partition tolerance is essential to maintain operations across distributed regions.</p>
</li>
</ul>
<p>Understanding the CAP theorem and its trade-offs helps engineers design systems that align with the unique requirements of their applications, ensuring reliability and performance in distributed environments.</p>
<hr />
<h3 id="heading-probing-the-cap-theorem"><strong>Probing the CAP Theorem</strong></h3>
<ol>
<li><p><strong>Can you only have 2 out of 3 CAP properties?</strong><br /> No, CAP means you must choose between <strong>Consistency</strong> and <strong>Availability</strong> during a partition, not abandon one entirely.</p>
</li>
<li><p><strong>Does partition tolerance eliminate partition challenges?</strong><br /> No, it ensures operation during partitions but doesn’t resolve consistency or availability issues.</p>
</li>
<li><p><strong>Example of a non-partition-tolerant system:</strong><br /> A centralized database or a multi-node system with synchronous replication halts during partitions due to dependency on full communication.</p>
</li>
<li><p><strong>How to make systems partition-tolerant?</strong></p>
<ul>
<li><p>Use <strong>eventual consistency</strong> to allow independent node decisions and reconcile later.</p>
</li>
<li><p>Adopt <strong>asynchronous replication</strong> to accept writes without waiting for acknowledgment.</p>
</li>
<li><p>Employ quorum-based systems for majority agreement.</p>
</li>
</ul>
</li>
<li><p><strong>Is partition tolerance optional?</strong><br /> No, distributed systems must handle partitions; the trade-off is between consistency and availability.</p>
</li>
<li><p><strong>What are CA systems?</strong><br /> CA systems prioritize consistency and availability but fail during partitions, making them non-partition-tolerant.</p>
</li>
<li><p><strong>Does 99.999% uptime mean high availability?</strong><br /> Not in CAP terms. Availability requires every request to a non-failing node to receive a valid response, even during partitions.</p>
</li>
<li><p><strong>Do timeout errors count as availability?</strong><br /> No, errors or timeouts compromise availability in CAP’s definition.</p>
</li>
<li><p><strong>Does eventual consistency meet CAP's consistency?</strong><br /> No, CAP’s consistency refers to strong consistency, which eventual consistency does not satisfy.</p>
</li>
<li><p><strong>Does relaxing consistency always lead to eventual consistency?</strong><br />Not always; it might result in unresolved inconsistencies without conflict resolution mechanisms.</p>
</li>
<li><p><strong>Can strong consistency be achieved with a majority quorum?</strong><br />Yes, but it sacrifices availability, adhering to CAP’s trade-offs.</p>
</li>
<li><p><strong>Does CAP apply to microservices?</strong><br />Yes, CAP principles are relevant to microservices as well as distributed databases.</p>
</li>
<li><p><strong>What if partition tolerance is ignored?</strong><br />Ignoring partition tolerance works in systems with reliable networks but risks failure during real-world partitions.</p>
</li>
<li><p><strong>When can partition tolerance be ignored?</strong><br />In tightly controlled environments (e.g., single-node systems or highly reliable networks), partitions are negligible. Examples: MySQL on a single server or Google Spanner with controlled infrastructure.</p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Understanding Caching and Cache Strategies]]></title><description><![CDATA[In the world of software engineering and distributed systems, caching is a fundamental technique for improving performance and scalability. By storing frequently accessed data closer to the user or application, caching reduces latency, minimizes load...]]></description><link>https://anishratnawat.com/understanding-caching-and-cache-strategies</link><guid isPermaLink="true">https://anishratnawat.com/understanding-caching-and-cache-strategies</guid><category><![CDATA[caching]]></category><category><![CDATA[caching strategies]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 03 Aug 2024 18:30:00 GMT</pubDate><content:encoded><![CDATA[<p>In the world of software engineering and distributed systems, caching is a fundamental technique for improving performance and scalability. By storing frequently accessed data closer to the user or application, caching reduces latency, minimizes load on backend systems, and enhances the overall user experience. In this article, we’ll explore the basics of caching, common cache strategies, and best practices for implementing an effective caching solution.</p>
<hr />
<h2 id="heading-what-is-caching">What is Caching?</h2>
<p>Caching is the process of storing a copy of data in a temporary storage location, called a cache, so that it can be retrieved more quickly on subsequent requests. Caches are typically placed in-memory, which allows for faster read/write operations compared to disk-based storage or database queries.</p>
<p>Caching is widely used in various layers of an application stack, including:</p>
<ul>
<li><p><strong>Database caching:</strong> To reduce query execution time.</p>
</li>
<li><p><strong>Application caching:</strong> To store results of expensive computations.</p>
</li>
<li><p><strong>Content delivery network (CDN):</strong> To cache static resources like images, CSS, and JavaScript files closer to the user.</p>
</li>
</ul>
<hr />
<h2 id="heading-cache-strategies">Cache Strategies</h2>
<h3 id="heading-1-write-through">1. <strong>Write-through</strong></h3>
<p>In the write-through strategy, every write operation is applied to both the cache and the underlying data store. This ensures that the cache and the database remain consistent.</p>
<ul>
<li><p><strong>Process:</strong></p>
<ol>
<li><p>Write data to the cache.</p>
</li>
<li><p>Propagate the write to the database.</p>
</li>
</ol>
</li>
<li><p><strong>Pros:</strong></p>
<ul>
<li>Ensures consistency between cache and database.</li>
</ul>
</li>
<li><p><strong>Cons:</strong></p>
<ul>
<li><p>Slower write operations due to dual writes.</p>
</li>
<li><p>Potentially redundant cache entries if the data is infrequently read.</p>
</li>
</ul>
</li>
</ul>
<hr />
<h3 id="heading-2-write-back-write-behind">2. <strong>Write-back (Write-behind)</strong></h3>
<p>In this strategy, write operations are performed on the cache, and asynchronously write to the datastore later.</p>
<ul>
<li><p><strong>Process:</strong></p>
<ol>
<li><p>Write data to the cache.</p>
</li>
<li><p>Periodically flush changes from the cache to the database.</p>
</li>
</ol>
</li>
<li><p><strong>Pros:</strong></p>
<ul>
<li><p>Faster writes as only the cache is updated initially.</p>
</li>
<li><p>Reduces write load on the database.</p>
</li>
</ul>
</li>
<li><p><strong>Cons:</strong></p>
<ul>
<li>Risk of data loss if the cache is not properly persisted before failure.</li>
</ul>
</li>
</ul>
<hr />
<h3 id="heading-3-write-around">3. <strong>Write-Around</strong></h3>
<p>In this strategy, write operations are performed on the datastore, bypassing the cache. We do cache aside load for this.</p>
<h3 id="heading-cache-aside-load"><strong>Cache-aside Load</strong></h3>
<p>If there is cache miss for the record then it will load data into the cache. The application code is responsible for checking the cache first before fetching data from the source of truth (e.g., a database).</p>
<ul>
<li><p><strong>Process:</strong></p>
<ol>
<li><p>Check if the data is in the cache.</p>
</li>
<li><p>If found, return the data.</p>
</li>
<li><p>If not, fetch the data from the database, store it in the cache, and return it.</p>
</li>
</ol>
</li>
<li><p><strong>Pros:</strong></p>
<ul>
<li><p>Simple to implement.</p>
</li>
<li><p>Provides fine-grained control over cache behavior.</p>
</li>
</ul>
</li>
<li><p><strong>Cons:</strong></p>
<ul>
<li>Potential for stale data if not properly invalidated.</li>
</ul>
</li>
</ul>
<hr />
<h2 id="heading-cache-eviction-policies">Cache Eviction Policies</h2>
<p>Caching systems have limited storage, so eviction policies determine which data to remove when the cache is full. Common eviction policies include:</p>
<ol>
<li><p><strong>Least Recently Used (LRU):</strong> Evicts the least recently accessed items first.</p>
</li>
<li><p><strong>Least Frequently Used (LFU):</strong> Evicts items accessed the least number of times.</p>
</li>
<li><p><strong>First In, First Out (FIFO):</strong> Evicts items in the order they were added.</p>
</li>
<li><p><strong>Random:</strong> Evicts random items to reduce complexity.</p>
</li>
</ol>
<hr />
<h2 id="heading-best-practices-for-caching">Best Practices for Caching</h2>
<ol>
<li><p><strong>Use Appropriate Expiration Times:</strong></p>
<ul>
<li>Set reasonable TTL values to avoid serving stale data.</li>
</ul>
</li>
<li><p><strong>Monitor Cache Performance:</strong></p>
<ul>
<li>Continuously track hit/miss rates to evaluate effectiveness.</li>
</ul>
</li>
<li><p><strong>Implement Cache Invalidation Strategies:</strong></p>
<ul>
<li>Use mechanisms like versioning or explicit invalidation to ensure data consistency.</li>
</ul>
</li>
<li><p><strong>Avoid Over-Caching:</strong></p>
<ul>
<li>Cache only what is necessary to prevent excessive memory usage.</li>
</ul>
</li>
<li><p><strong>Secure Your Cache:</strong></p>
<ul>
<li>Use encryption and access controls to protect sensitive data.</li>
</ul>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Basics of Content Delivery Network]]></title><description><![CDATA[A CDN is a distributed network of servers strategically placed across different geographical locations. These servers work together to deliver content, such as HTML pages, JavaScript files, stylesheets, images, and videos, to users based on their pro...]]></description><link>https://anishratnawat.com/basics-of-content-delivery-network</link><guid isPermaLink="true">https://anishratnawat.com/basics-of-content-delivery-network</guid><category><![CDATA[CDN]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 20 Jul 2024 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1737555529203/d195a1e4-e1c9-44d3-8586-ab0efea66b30.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A CDN is a distributed network of servers strategically placed across different geographical locations. These servers work together to deliver content, such as HTML pages, JavaScript files, stylesheets, images, and videos, to <strong>users</strong> based on their <strong>proximity</strong> to a server.</p>
<p>By reducing the distance between users and servers, CDNs minimize latency, improve load times, and enhance the overall user experience.</p>
<h3 id="heading-how-does-a-cdn-work">How Does a CDN Work?</h3>
<p>At its core, a CDN works by caching content on multiple servers spread across various geographical locations, also known as edge servers. Here’s a step-by-step breakdown of how a CDN operates:</p>
<ol>
<li><p><strong>Content Caching:</strong> The origin server uploads content to the CDN’s edge servers in case of Push CDN.</p>
</li>
<li><p><strong>User Request:</strong> In case of Pull CDN, When a user requests a resource (e.g., a webpage or an image), the request is routed to the nearest CDN edge server based on the user's location.</p>
</li>
<li><p><strong>Cache Lookup:</strong> The edge server checks if the requested content is cached.</p>
<ul>
<li><p>If cached, the content is delivered directly to the user, ensuring minimal latency.</p>
</li>
<li><p>If not cached, the edge server retrieves the content from the origin server, serves it to the user, and caches it for future requests.</p>
</li>
</ul>
</li>
<li><p><strong>Content Delivery:</strong> The content is delivered to the user from the edge server, reducing load on the origin server and enhancing the user experience.</p>
</li>
</ol>
<p>This distributed approach ensures faster load times, reduced bandwidth costs, and improved reliability, even during traffic spikes.</p>
<h3 id="heading-cdn-architecture">CDN Architecture</h3>
<p>A typical CDN architecture consists of the following components:</p>
<ol>
<li><p><strong>Origin Server:</strong> The central repository where the original content is stored, typically the website’s hosting server.</p>
</li>
<li><p><strong>Edge Servers:</strong> Distributed servers located in various geographical locations. These servers cache content to serve users from the nearest possible location.</p>
</li>
<li><p><strong>Points of Presence (PoPs):</strong> Physical data centers housing edge servers, strategically placed to maximize coverage and minimize latency.</p>
</li>
<li><p><strong>Load Balancer:</strong> Distributes incoming traffic across multiple servers to prevent overloading any single server.</p>
</li>
<li><p><strong>Content Routing Mechanism:</strong> Uses algorithms to direct user requests to the most optimal edge server based on factors like proximity, server health, and cache availability.</p>
</li>
<li><p><strong>Analytics and Monitoring Tools:</strong> Collect data on performance metrics, user behavior, and system health, providing actionable insights for optimization.</p>
</li>
</ol>
<p>CDN providers typically do not dedicate a single edge server to a specific origin server. Instead, edge servers are shared among multiple origin servers and content providers. This shared infrastructure approach allows CDNs to maximize resource utilization, distribute traffic efficiently, and offer cost-effective solutions to their clients.</p>
<p>However, some enterprise-level CDN services, such as Akamai or AWS CloudFront, may provide dedicated or isolated resources for specific high-demand clients. This could include private caching configurations or dedicated PoPs (Points of Presence) for clients with unique security, compliance, or performance requirements.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1737555392068/c2d43ecc-e35c-4596-a49c-7371f6aeb3ee.png" alt class="image--center mx-auto" /></p>
<p>This modular architecture enables CDNs to deliver content efficiently, handle high traffic volumes, and provide resilience against server outages or DDoS attacks.</p>
<h3 id="heading-key-benefits-of-using-a-cdn">Key Benefits of Using a CDN</h3>
<ul>
<li><p><strong>Reduced Latency:</strong> Faster load times for end-users.</p>
</li>
<li><p><strong>Improved Availability:</strong> Enhanced uptime and reliability.</p>
</li>
<li><p><strong>Reduced Server Load:</strong> Offloads traffic from the origin server.</p>
</li>
<li><p><strong>Better Scalability:</strong> Handles traffic spikes efficiently.</p>
</li>
</ul>
<h3 id="heading-push-vs-pull-cdns">Push vs. Pull CDNs</h3>
<p>CDNs can operate in two primary modes: push and pull. Each mode has its unique use cases, advantages, and trade-offs.</p>
<h4 id="heading-push-cdn">Push CDN</h4>
<p>In a push CDN, the content is manually uploaded to the CDN's servers by the content provider. The provider is responsible for ensuring that the latest version of the content is available on the CDN.</p>
<p><strong>How It Works:</strong></p>
<ol>
<li><p>The website owner uploads content (e.g., images, videos) to the CDN.</p>
</li>
<li><p>The CDN stores this content in its servers.</p>
</li>
<li><p>When a user requests the content, it is served directly from the CDN’s servers.</p>
</li>
</ol>
<p><strong>Example Use Case:</strong> A media company hosting high-quality videos might use a push CDN to pre-upload their videos to ensure users always get the best experience without latency.</p>
<h4 id="heading-pull-cdn">Pull CDN</h4>
<p>In a pull CDN, content is fetched dynamically from the origin server and cached on the CDN’s edge servers when a user requests it for the first time.</p>
<p><strong>How It Works:</strong></p>
<ol>
<li><p>A user requests content.</p>
</li>
<li><p>If the content is not already cached in the CDN, it is fetched from the origin server.</p>
</li>
<li><p>The fetched content is cached for subsequent requests.</p>
</li>
</ol>
<p><strong>Example Use Case:</strong> An e-commerce website with frequently updated product images and descriptions can leverage a pull CDN to ensure users always receive the latest content.</p>
<h3 id="heading-key-differences-between-push-and-pull-cdns">Key Differences Between Push and Pull CDNs</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>Push CDN</td><td>Pull CDN</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Content Upload</strong></td><td>Manual push</td><td>Automatic pull by CDN</td></tr>
<tr>
<td><strong>Best For</strong></td><td>Static, infrequently updated content</td><td>Dynamic, frequently updated content</td></tr>
<tr>
<td><strong>Initial Latency</strong></td><td>Low</td><td>Higher (during the first request)</td></tr>
<tr>
<td><strong>Management Effort</strong></td><td>Higher</td><td>Lower</td></tr>
<tr>
<td><strong>Cost Predictability</strong></td><td>More predictable</td><td>Depends on cache hit/miss ratio</td></tr>
</tbody>
</table>
</div><h3 id="heading-hybrid-approach">Hybrid Approach</h3>
<p>Some CDN providers offer a hybrid model, combining the best of both push and pull CDNs. This allows businesses to push critical static assets while relying on pull mechanisms for dynamic content.</p>
<h3 id="heading-real-world-cdn-providers">Real-World CDN Providers</h3>
<ol>
<li><p><strong>Cloudflare:</strong> Primarily operates as a pull CDN, suitable for dynamic websites.</p>
</li>
<li><p><strong>Akamai:</strong> Offers both push and pull CDN configurations for enterprise-level applications.</p>
</li>
<li><p><strong>Amazon CloudFront:</strong> Supports a hybrid approach with extensive customization options.</p>
</li>
</ol>
<h3 id="heading-how-cdn-sends-analytics-back-to-the-server">How CDN Sends Analytics Back to the Server</h3>
<p>CDNs not only deliver content efficiently but also provide valuable analytics to help content providers monitor performance and user behavior. Here's how it works:</p>
<ol>
<li><p><strong>Data Collection:</strong> The CDN edge servers collect data on metrics such as user location, content type, request times, cache hits and misses, and bandwidth usage.</p>
</li>
<li><p><strong>Aggregation:</strong> This data is aggregated in real-time or near real-time to provide a comprehensive view of content delivery performance.</p>
</li>
<li><p><strong>Transmission to the Origin Server:</strong> The CDN transmits this aggregated data back to the origin server or a centralized analytics system, often via APIs or dashboards.</p>
</li>
<li><p><strong>Actionable Insights:</strong> Content providers can use these insights to optimize delivery strategies, improve cache efficiency, and enhance user experience.</p>
</li>
</ol>
<p>For example, a streaming platform can monitor which regions have the most users experiencing latency, allowing them to strategically deploy additional edge servers in those locations. Analytics also help in identifying popular content, assisting in targeted marketing campaigns.</p>
]]></content:encoded></item><item><title><![CDATA[Latency Numbers reference for System Design]]></title><description><![CDATA[Latency numbers can provide valuable context during system design , especially when discussing performance optimization, scalability, and trade-offs. Here are some common latency numbers worth referen]]></description><link>https://anishratnawat.com/latency-numbers-reference-for-system-design</link><guid isPermaLink="true">https://anishratnawat.com/latency-numbers-reference-for-system-design</guid><category><![CDATA[latency]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 20 Jul 2024 18:30:00 GMT</pubDate><content:encoded><![CDATA[<p>Latency numbers can provide valuable context during system design , especially when discussing performance optimization, scalability, and trade-offs. Here are some common latency numbers worth referencing:</p>
<h3><strong>Modern Hardware Limits</strong></h3>
<p>Today’s servers have massive capacities that change the "distributed vs. single machine" trade-off.</p>
<ul>
<li><p><strong>Compute/Memory:</strong> Standard high-end instances (like AWS M6i) offer 128 vCPUs and 512 GB of RAM, while specialized instances can go up to <strong>24 TB of RAM</strong>.</p>
</li>
<li><p><strong>Storage:</strong> Local SSDs can handle <strong>60 TB</strong> on a single instance, and HDDs can reach over <strong>300 TB</strong>.</p>
</li>
<li><p><strong>Networking:</strong> 25–100 Gbps is standard within data centers. Latency is sub-1ms within an Availability Zone (AZ) and ~1-2ms between AZs.</p>
</li>
</ul>
<hr />
<h3><strong>Component Capacities (Single Node)</strong></h3>
<table style="min-width:50px"><colgroup><col style="min-width:25px"></col><col style="min-width:25px"></col></colgroup><tbody><tr><td><p><strong>Component</strong></p></td><td><p><strong>Modern Capacity / Throughput</strong></p></td></tr><tr><td><p><strong>Server Memory (High-end)</strong></p></td><td><p>Up to 4 TB (Standard) to 24 TB (Metal)</p></td></tr><tr><td><p><strong>Local SSD Storage</strong></p></td><td><p>60 TB+ (e.g., AWS i3en instances)</p></td></tr><tr><td><p><strong>Database Storage (Single Node)</strong></p></td><td><p>5-10 TB (before sharding is strictly necessary)</p></td></tr><tr><td><p><strong>SQL Writes (Postgres/MySQL)</strong></p></td><td><p>10k - 50k writes/sec (well-tuned)</p></td></tr><tr><td><p><strong>SQL Reads (Indexed)</strong></p></td><td><p>100k+ reads/sec</p></td></tr><tr><td><p><strong>Redis Throughput</strong></p></td><td><p>100k - 1M operations/sec</p></td></tr><tr><td><p><strong>App Server Connections</strong></p></td><td><p>10k - 50k concurrent connections (Async I/O)</p></td></tr><tr><td><p><strong>Network Bandwidth</strong></p></td><td><p>25 Gbps - 100 Gbps</p></td></tr></tbody></table>

<h3><strong>Key Rules of Thumb</strong></h3>
<ul>
<li><p><strong>The "1TB Rule":</strong> If your total dataset is under 1TB, it can likely fit entirely in the RAM of a few high-memory cache nodes or on the disk of a single modern database instance.</p>
</li>
<li><p><strong>The "Sharding Threshold":</strong> Don't suggest sharding a database for <em>storage</em> reasons unless you exceed <strong>5-10 TB</strong>. Don't shard for <em>write throughput</em> unless you exceed <strong>20k-50k writes/second</strong>.</p>
</li>
<li><p><strong>The "Cache-First" Fallacy:</strong> Modern NVMe SSDs are so fast (10-50μs) that if your database query is a simple primary key lookup, you might not even need Redis for performance; use it for scaling read-heavy traffic or reducing DB load instead.</p>
</li>
<li><p><strong>Concurrency:</strong> One single modern application server can handle almost any "mid-sized" startup’s total traffic. When designing for millions of users, think in dozens of servers, not thousands.</p>
</li>
</ul>
<hr />
<h3><strong>Basic Operations</strong></h3>
<ul>
<li><p><strong>L1 cache reference:</strong> ~1 nanoseconds</p>
</li>
<li><p><strong>L2 cache reference:</strong> ~7 nanoseconds</p>
</li>
<li><p><strong>Main memory (RAM) reference:</strong> ~0.1 milliseconds</p>
</li>
<li><p><strong>SSD I/O (read/write):</strong> ~100 microseconds</p>
</li>
<li><p><strong>Disk I/O (HDD, seek):</strong> ~10 milliseconds</p>
</li>
</ul>
<p>When you read from a database or a remote cache, you aren't just paying for the time it takes to find the data; you are paying for the round-trip journey.</p>
<ul>
<li><p><strong>Remote Cache (e.g., Redis on a separate VM):</strong></p>
<ul>
<li><p><strong>Internal Processing:</strong> ~0.1 ms</p>
</li>
<li><p><strong>Network Overhead:</strong> ~0.5 ms to 1.0 ms (within the same Availability Zone)</p>
</li>
<li><p><strong>Total:</strong> <strong>~1.1 ms</strong></p>
</li>
</ul>
</li>
<li><p><strong>Remote Database (e.g., Postgres/MySQL):</strong></p>
<ul>
<li><p><strong>Internal Processing:</strong> ~5 ms to 50 ms (index lookup + disk I/O)</p>
</li>
<li><p><strong>Network Overhead:</strong> ~0.5 ms to 1.0 ms</p>
</li>
<li><p><strong>Total:</strong> <strong>~5.5 ms to 51 ms</strong></p>
</li>
</ul>
</li>
</ul>
<hr />
<h3><strong>Data Processing</strong></h3>
<ul>
<li><p><strong>Reading 1 MB from RAM:</strong> ~250 microseconds</p>
</li>
<li><p><strong>Reading 1 MB from SSD:</strong> ~1 millisecond</p>
</li>
<li><p><strong>Reading 1 MB from HDD:</strong> ~10 milliseconds</p>
</li>
</ul>
<hr />
<h3><strong>Network Latencies</strong></h3>
<ul>
<li><p><strong>1 KB data transfer on 1 Gbps network:</strong> ~10 microseconds</p>
</li>
<li><p><strong>Round trip within the same AZ:</strong> &lt; 1 milliSecond</p>
</li>
<li><p><strong>Round trip between cross AZ (same region):</strong> ~ 1-2 ms</p>
</li>
<li><p><strong>Round trip between two data centers (different region):</strong> ~ 60-200 ms depends on distance</p>
</li>
<li><p><strong>Round trip between inter-continent:</strong> ~ 150 ms depends on distance</p>
</li>
</ul>
<hr />
<h3><strong>Cloud Services</strong></h3>
<ul>
<li><p><strong>API gateway call latency:</strong> ~1-10 milliseconds</p>
</li>
<li><p><strong>Query on a NoSQL database (e.g., DynamoDB):</strong> ~5-20 milliseconds</p>
</li>
<li><p><strong>Query on an SQL database:</strong> ~5-10 milliseconds for simple queries; complex queries can take seconds.</p>
</li>
</ul>
<hr />
<h2>FAQ</h2>
<ol>
<li><p><strong>Main memory (RAM) reference is 100 nanoseconds but Reading 1 MB from RAM is 250 microseconds, explain ?</strong></p>
<p><strong>Answer:</strong></p>
<p><strong>100 ns:</strong> Time to access a single memory location (latency), which is to fetch a small chunk of data (e.g., 64 bytes).</p>
<p><strong>250 µs:</strong> Time to read 1 MB, including latency and transfer time.</p>
<ul>
<li><p>Modern RAM modules have high bandwidth, often in the range of <strong>tens of GB/s</strong>. For example:</p>
</li>
<li><p>Assume a memory bandwidth of <strong>20 GB/s</strong> (DDR4/DDR5 range).</p>
</li>
<li><p>Time to transfer 1 MB = 1 MB / 20 GB/s=2^20 bytes / 20×10^9 bytes/s ≈ <strong>50 μs</strong></p>
</li>
</ul>
<p>However, the transfer process also incurs <strong>latency overheads</strong> for accessing multiple addresses and managing the memory bus, which is why the total time to read 1 MB is closer to <strong>~250 microseconds</strong> rather than the raw bandwidth estimate.</p>
</li>
<li><p><strong>Disk I/O (HDD, seek) is 10 milliseconds and Reading 1 MB from HDD is also 10 milliseconds, why ?</strong></p>
<p><strong>Answer:</strong></p>
<p>Disk I/O refers to the <strong>seek time</strong>, which is the delay required for the hard disk drive (HDD) to position its read/write head over the correct track on the spinning disk. This latency happens <strong>before any data is read</strong> and is <strong>independent of the data size</strong>.</p>
<p><strong>Reading 1 MB from HDD: ~10 milliseconds</strong></p>
<ul>
<li><p>This is the total time required to read <strong>1 MB of data</strong> from the disk, including:</p>
<ol>
<li><p><strong>Seek time (~10 ms)</strong>: Positioning the read head.</p>
</li>
<li><p><strong>Data transfer time</strong>: Time to physically transfer 1 MB from the spinning disk to memory.</p>
</li>
</ol>
<ul>
<li><p>Modern HDDs have sequential read speeds of ~100 MB/s. Therefore:</p>
<ul>
<li>Transfer time for 1 MB = 1 MB^ 100 MB/s=0.01 seconds=<strong>10 ms</strong></li>
</ul>
</li>
</ul>
</li>
</ul>
<p>For small reads (e.g., a few KB or even 1 byte), the <strong>seek time dominates</strong>, so the total latency is still close to 10 ms.</p>
<p>For larger reads (e.g., 1 MB), the <strong>transfer time adds to the seek time</strong>, but because the transfer speed is high, it doesn’t increase latency significantly for moderate data sizes like 1 MB.</p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Understanding Gateway, Load Balancer, Forward Proxy, and Reverse Proxy]]></title><description><![CDATA[Gateway
A gateway acts as a single entry point into a system. It is commonly used in microservices architectures to route client requests to appropriate backend services. Gateways often incorporate additional functionalities like:

Authentication and...]]></description><link>https://anishratnawat.com/understanding-gateway-load-balancer-forward-proxy-and-reverse-proxy</link><guid isPermaLink="true">https://anishratnawat.com/understanding-gateway-load-balancer-forward-proxy-and-reverse-proxy</guid><category><![CDATA[Load Balancer]]></category><category><![CDATA[gateway]]></category><category><![CDATA[forward-proxy]]></category><category><![CDATA[Reverse Proxy]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 15 Jun 2024 18:30:00 GMT</pubDate><content:encoded><![CDATA[<h2 id="heading-gateway">Gateway</h2>
<p>A <strong>gateway</strong> acts as a single entry point into a system. It is commonly used in microservices architectures to route client requests to appropriate backend services. Gateways often incorporate additional functionalities like:</p>
<ul>
<li><p><strong>Authentication and Authorization:</strong> Ensuring only authorized users access certain resources.</p>
</li>
<li><p><strong>Protocol Translation:</strong> Converting protocols like HTTP to WebSocket or gRPC.</p>
</li>
<li><p><strong>Request Aggregation:</strong> Combining responses from multiple services into a single response.</p>
</li>
<li><p><strong>Service Discovery:</strong> process of finding and locating available service instances within a distributed system.</p>
</li>
<li><p><strong>Rate Limiting:</strong> technique that limits the number of requests an API can handle in a given time frame.</p>
</li>
<li><p><strong>SSL Termination:</strong> process of decrypting encrypted traffic before passing it along to a web server.</p>
</li>
</ul>
<p><strong>Popular Tools:</strong></p>
<ul>
<li><p><strong>API Gateway:</strong> Tools like Kong, Apigee, or AWS API Gateway manage APIs by abstracting backend services and enforcing policies.</p>
</li>
<li><p><strong>Service Gateway:</strong> Tools like Istio function within service meshes, facilitating inter-service communication and applying policies.</p>
</li>
</ul>
<hr />
<h2 id="heading-load-balancer">Load Balancer</h2>
<p>A <strong>load balancer</strong> distributes incoming network traffic across multiple servers to ensure high availability and reliability. It helps achieve fault tolerance, scalability, and optimal resource utilization.</p>
<p><strong>Types of Load Balancers:</strong></p>
<ol>
<li><p><strong>Layer 4 Load Balancers:</strong> Operate at the transport layer (TCP/UDP) and use information like IP address and port for routing.</p>
<ul>
<li>Example: AWS Elastic Load Balancer (Classic).</li>
</ul>
</li>
<li><p><strong>Layer 7 Load Balancers:</strong> Operate at the application layer, making routing decisions based on HTTP headers, URLs, and other application-level data.</p>
<ul>
<li>Example: NGINX, HAProxy.</li>
</ul>
</li>
</ol>
<p><strong>Load Balancing Algorithms:</strong></p>
<ul>
<li><p><strong>Round Robin:</strong> Requests are distributed sequentially across servers.</p>
</li>
<li><p><strong>Least Connections:</strong> Routes to the server with the fewest active connections.</p>
</li>
<li><p><strong>IP Hash:</strong> Routes based on client IP, ensuring session persistence.</p>
</li>
</ul>
<hr />
<h2 id="heading-gateway-vs-load-balancers">Gateway vs Load Balancers</h2>
<p>There are two scenarios to consider here to clarify the confusion. I have clarified this using microservices example as this would make sense there only.</p>
<p><strong>Scenario 1: You have a cluster of API Gateways</strong></p>
<p>User ---&gt; Load Balancer (provided by Cloud Providers like AWS or your own) ---&gt; API Gateway Cluster ---&gt; Service Discovery Agent (like <em>eureka</em>) ---&gt; Microservice A ---&gt; Client Side Load Balancer ---&gt; Microservice B</p>
<p><strong>Scenario 2: You have a <em>single API Gateway</em></strong></p>
<p>User ---&gt; API Gateway ---&gt; Service Discovery Agent (like <em>Eureka</em>) ---&gt; Microservice A ---&gt; Client Side Load Balancer -&gt; Microservice B</p>
<p>I hope you understand why we required Load Balancer before the API Gateway in Scenario 1, as there we had multiple instances of API gateway also to handle the large traffic and to avoid the burden on the single api gateway since gateway itself can have several task to manage as per the requirements, so to distribute the load among them, we have load balancer.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1737557188303/1ff802cb-e127-4d7a-a5d8-f499e2048887.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-forward-proxy">Forward Proxy</h2>
<p>A <strong>forward proxy</strong> Forward proxy acts as a request server that hides client identify and execute client request on behalf on client hiding client IP.. It is typically used for:</p>
<ul>
<li><p><strong>Caching:</strong> Storing frequently accessed data to reduce latency.</p>
</li>
<li><p><strong>Anonymity:</strong> Masking the client's identity.</p>
</li>
<li><p><strong>Content Filtering:</strong> Blocking access to certain websites.</p>
</li>
</ul>
<p><strong>Use Case Example:</strong> An organization might use a forward proxy to allow employees to access the internet while restricting access to non-work-related sites.</p>
<hr />
<h2 id="heading-reverse-proxy">Reverse Proxy</h2>
<p>A <strong>reverse proxy</strong> operates on behalf of servers, handling requests from clients and forwarding them to appropriate backend servers. Common functionalities include:</p>
<ul>
<li><p><strong>Load Balancing:</strong> Distributing traffic among servers.</p>
</li>
<li><p><strong>SSL Termination:</strong> Offloading SSL decryption to reduce server load.</p>
</li>
<li><p><strong>Caching:</strong> Storing responses to reduce server load and latency for repeated requests.</p>
</li>
<li><p><strong>Security:</strong> Hiding backend server details and blocking malicious traffic.</p>
</li>
</ul>
<p><strong>Popular Tools:</strong> NGINX, Apache HTTP Server, Traefik.</p>
<p><strong>Use Case Example:</strong> A reverse proxy can sit in front of an application server, handling SSL termination, caching, and load balancing.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1737557312207/b0e428bc-f314-4181-81fd-7e3727a1bc30.png" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[REST vs RPC vs HTTP vs TCP vs UDP: Understanding the Differences]]></title><description><![CDATA[REST, RPC, HTTP, TCP, and UDP—each operates at different levels of abstraction and serves different purposes in network communication:

📦 1. TCP (Transmission Control Protocol)

Type: Transport Layer Protocol (OSI Layer 4)

Purpose: Reliable, ordere...]]></description><link>https://anishratnawat.com/rest-vs-rpc-understanding-the-differences</link><guid isPermaLink="true">https://anishratnawat.com/rest-vs-rpc-understanding-the-differences</guid><category><![CDATA[restvsrpc]]></category><category><![CDATA[REST]]></category><category><![CDATA[RPC]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 04 May 2024 18:30:00 GMT</pubDate><content:encoded><![CDATA[<p><strong>REST</strong>, <strong>RPC</strong>, <strong>HTTP</strong>, <strong>TCP</strong>, and <strong>UDP</strong>—each operates at different levels of abstraction and serves different purposes in network communication:</p>
<hr />
<h3 id="heading-1-tcp-transmission-control-protocol">📦 1. <strong>TCP (Transmission Control Protocol)</strong></h3>
<ul>
<li><p><strong>Type</strong>: Transport Layer Protocol (OSI Layer 4)</p>
</li>
<li><p><strong>Purpose</strong>: Reliable, ordered, and error-checked delivery of data between applications</p>
</li>
<li><p><strong>Use Cases</strong>: Web (HTTP), Email (SMTP), FTP</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Connection-oriented</p>
</li>
<li><p>Guarantees packet delivery</p>
</li>
<li><p>Slower due to overhead (acknowledgements, retransmission, flow control)</p>
</li>
</ul>
</li>
</ul>
<hr />
<h3 id="heading-2-udp-user-datagram-protocol">💨 2. <strong>UDP (User Datagram Protocol)</strong></h3>
<ul>
<li><p><strong>Type</strong>: Transport Layer Protocol (OSI Layer 4)</p>
</li>
<li><p><strong>Purpose</strong>: Fast, connectionless communication</p>
</li>
<li><p><strong>Use Cases</strong>: Video streaming, online gaming, DNS, VoIP</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>No guarantee of delivery or order</p>
</li>
<li><p>No connection setup — lightweight and fast</p>
</li>
<li><p>Suitable for latency-sensitive apps</p>
</li>
</ul>
</li>
</ul>
<hr />
<h3 id="heading-3-http-hypertext-transfer-protocol">🌐 3. <strong>HTTP (Hypertext Transfer Protocol)</strong></h3>
<ul>
<li><p><strong>Type</strong>: Application Layer Protocol (built on TCP)</p>
</li>
<li><p><strong>Purpose</strong>: Transmit hypermedia (HTML, JSON, etc.) between clients and servers</p>
</li>
<li><p><strong>Use Cases</strong>: Web APIs, browsers, REST APIs</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Stateless, request-response protocol</p>
</li>
<li><p>Typically runs on port 80 (HTTP) or 443 (HTTPS)</p>
</li>
<li><p>Built on top of TCP</p>
</li>
</ul>
</li>
</ul>
<blockquote>
<p><strong>Note:</strong> HTTP is often used as the transport layer for both REST and RPC.</p>
</blockquote>
<hr />
<h3 id="heading-4-rpc-remote-procedure-call">🔁 4. <strong>RPC (Remote Procedure Call)</strong></h3>
<ul>
<li><p><strong>Type</strong>: Programming concept / communication pattern</p>
</li>
<li><p><strong>Purpose</strong>: Execute a function/procedure on a remote server as if it's local</p>
</li>
<li><p><strong>Use Cases</strong>: gRPC, Thrift, XML-RPC, JSON-RPC</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Client invokes remote methods directly</p>
</li>
<li><p>Abstracts transport layer details</p>
</li>
<li><p>Can be tightly coupled (harder to evolve over time)</p>
</li>
</ul>
</li>
</ul>
<p><strong>Note:</strong> gRPC uses protobuf for data serialisation which serialise data into binary format. Protobuf alone can be used with HTTP as a replacement of JSON data.</p>
<hr />
<h3 id="heading-5-rest-representational-state-transfer">🌱 5. <strong>REST (Representational State Transfer)</strong></h3>
<ul>
<li><p><strong>Type</strong>: Architectural style using HTTP</p>
</li>
<li><p><strong>Purpose</strong>: Build scalable and loosely-coupled web APIs</p>
</li>
<li><p><strong>Use Cases</strong>: Public APIs, microservices communication</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Resource-based (<code>GET /users/1</code>, <code>POST /orders</code>)</p>
</li>
<li><p>Stateless and cacheable</p>
</li>
<li><p>Uses HTTP verbs (GET, POST, PUT, DELETE)</p>
</li>
</ul>
</li>
</ul>
<hr />
<h3 id="heading-summary-comparison-table">🧠 Summary Comparison Table</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>TCP</td><td>UDP</td><td>HTTP</td><td>RPC</td><td>REST</td></tr>
</thead>
<tbody>
<tr>
<td>Layer</td><td>Transport</td><td>Transport</td><td>App</td><td>App concept</td><td>App concept</td></tr>
<tr>
<td>Reliability</td><td>Yes</td><td>No</td><td>Yes</td><td>Depends</td><td>Yes</td></tr>
<tr>
<td>Protocol Style</td><td>Stream</td><td>Datagram</td><td>Request/Response</td><td>Function Call</td><td>Resource-based</td></tr>
<tr>
<td>Transport Used</td><td>N/A</td><td>N/A</td><td>TCP</td><td>TCP/HTTP/Custom</td><td>HTTP</td></tr>
<tr>
<td>Speed</td><td>Moderate</td><td>Fast</td><td>Moderate</td><td>Fast</td><td>Moderate</td></tr>
<tr>
<td>Use Case</td><td>Raw data</td><td>Real-time</td><td>Web APIs</td><td>Microservices</td><td>Web APIs</td></tr>
</tbody>
</table>
</div><hr />
<h3 id="heading-in-practice">🤔 In Practice</h3>
<ul>
<li><p><strong>TCP vs UDP</strong> = how data is transferred</p>
</li>
<li><p><strong>HTTP</strong> = how clients/servers communicate over the web</p>
</li>
<li><p><strong>REST vs RPC</strong> = how APIs are designed</p>
</li>
<li><p><strong>REST over HTTP</strong> is a common web API pattern</p>
</li>
<li><p><strong>RPC</strong> can be over HTTP (e.g., gRPC with HTTP/2), or directly on TCP</p>
</li>
</ul>
<h2 id="heading-choosing-between-rest-and-rpc">Choosing Between REST and RPC</h2>
<p>The choice between REST and RPC boils down to the needs of your application:</p>
<ul>
<li><p><strong>Choose REST</strong> if simplicity, compatibility, and resource orientation are key.</p>
</li>
<li><p><strong>Choose RPC</strong> if performance, compact payloads, and action orientation are critical.</p>
</li>
</ul>
<hr />
<h2 id="heading-case-study-of-linkedin-latency-optimization-by-60">Case Study of LinkedIn latency optimization by 60%</h2>
<p>LinkedIn significantly improved its latency—by up to <strong>60%</strong>—by replacing <strong>JSON</strong> with <strong>Protocol Buffers (Protobuf)</strong> for data serialization. Here’s how they achieved this:</p>
<hr />
<h3 id="heading-1-why-did-linkedin-replace-json"><strong>1. Why Did LinkedIn Replace JSON?</strong></h3>
<p>JSON is widely used for serialization due to its human readability and simplicity, but it has several drawbacks:</p>
<ul>
<li><p><strong>High serialization/deserialization time</strong>: JSON relies on text-based encoding, which requires expensive parsing.</p>
</li>
<li><p><strong>Large payload sizes</strong>: JSON data is verbose due to repeated keys and lack of efficient binary encoding.</p>
</li>
<li><p><strong>More Network Bandwidth:</strong> Json consumes more network bandwidth and thus increases the latency.</p>
</li>
<li><p><strong>High CPU usage</strong>: Serialization and deserialization are computationally expensive, especially for large-scale distributed systems.</p>
</li>
</ul>
<p>LinkedIn, handling billions of requests per day, faced <strong>latency issues</strong> and <strong>increased infrastructure costs</strong> due to these inefficiencies.</p>
<hr />
<h3 id="heading-2-how-did-protobuf-help"><strong>2. How Did Protobuf Help?</strong></h3>
<p><strong>a) Compact Binary Encoding</strong></p>
<ul>
<li><p>Protobuf is a binary format, which means it requires <strong>less bandwidth</strong> and <strong>less memory</strong> for transmission compared to JSON.</p>
</li>
<li><p>JSON includes redundant key names, while Protobuf uses numeric field tags, reducing data size significantly.</p>
</li>
</ul>
<p><strong>b) Faster Serialization &amp; Deserialization</strong></p>
<ul>
<li><p>JSON requires <strong>string parsing</strong>, while Protobuf directly maps to <strong>efficient binary representations</strong>, leading to <strong>faster encoding/decoding</strong>.</p>
</li>
<li><p>This improves <strong>CPU efficiency</strong> and reduces <strong>garbage collection overhead</strong> in JVM-based applications.</p>
</li>
</ul>
<p><strong>c) Schema Evolution Without Breaking Changes</strong></p>
<ul>
<li><p>Protobuf supports <strong>backward and forward compatibility</strong>, allowing LinkedIn to evolve APIs smoothly without impacting older clients.</p>
</li>
<li><p>JSON lacks built-in schema enforcement, increasing the risk of breaking changes.</p>
</li>
</ul>
<hr />
<h3 id="heading-3-measured-performance-gains"><strong>3. Measured Performance Gains</strong></h3>
<p>LinkedIn observed the following improvements after switching to Protobuf:</p>
<ul>
<li><p><strong>Latency reduced by 60%</strong> (mostly due to faster serialization/deserialization).</p>
</li>
<li><p><strong>Payload size reduced by 50-80%</strong>, leading to lower <strong>network bandwidth usage</strong>.</p>
</li>
<li><p><strong>CPU utilization dropped</strong>, allowing better resource utilization.</p>
</li>
</ul>
<hr />
<h3 id="heading-4-where-did-linkedin-apply-protobuf"><strong>4. Where Did LinkedIn Apply Protobuf?</strong></h3>
<p>LinkedIn initially introduced Protobuf in its <strong>Venice key-value store</strong> and later expanded it to <strong>other services</strong> such as:</p>
<ul>
<li><p><a target="_blank" href="http://Rest.li"><strong>Rest.li</strong></a> (LinkedIn's API framework)</p>
</li>
<li><p><strong>Kafka messages</strong> for event streaming</p>
</li>
<li><p><strong>Inter-service communication</strong> within microservices</p>
</li>
</ul>
<hr />
<h3 id="heading-5-lessons-for-other-companies"><strong>5. Lessons for Other Companies</strong></h3>
<p>If your system is <strong>high-scale and latency-sensitive</strong>, switching from JSON to Protobuf can:</p>
<ul>
<li><p>Improve <strong>API performance</strong> in microservices.</p>
</li>
<li><p>Reduce <strong>cloud/server costs</strong> due to lower CPU and bandwidth usage.</p>
</li>
<li><p>Enhance <strong>data consistency</strong> with schema enforcement.</p>
</li>
</ul>
<p>However, Protobuf is <strong>not human-readable</strong>, which can make debugging harder compared to JSON. For applications requiring <strong>human interaction with APIs (e.g., REST APIs for web clients)</strong>, JSON may still be preferable.</p>
<p><strong>Reference:</strong> <a target="_blank" href="https://www.linkedin.com/blog/engineering/infrastructure/linkedin-integrates-protocol-buffers-with-rest-li-for-improved-m">https://www.linkedin.com/blog/engineering/infrastructure/linkedin-integrates-protocol-buffers-with-rest-li-for-improved-m</a></p>
]]></content:encoded></item><item><title><![CDATA[Partitioning vs Sharding: Key Concepts for Scalable Systems]]></title><description><![CDATA[In the realm of distributed systems and databases, partitioning and sharding are two terms that often come up when discussing scalability and performance. While they share similarities, they serve distinct purposes and are implemented differently. Th...]]></description><link>https://anishratnawat.com/partitioning-vs-sharding-key-concepts-for-scalable-systems</link><guid isPermaLink="true">https://anishratnawat.com/partitioning-vs-sharding-key-concepts-for-scalable-systems</guid><category><![CDATA[partitioning]]></category><category><![CDATA[sharding]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 20 Apr 2024 18:30:00 GMT</pubDate><content:encoded><![CDATA[<p>In the realm of distributed systems and databases, <strong>partitioning</strong> and <strong>sharding</strong> are two terms that often come up when discussing scalability and performance. While they share similarities, they serve distinct purposes and are implemented differently. This blog explores the nuances of partitioning and sharding, their use cases, and how to choose the right approach for your system.</p>
<hr />
<h2 id="heading-what-is-partitioning">What is Partitioning?</h2>
<p>Partitioning is the process of dividing a dataset into smaller, more manageable pieces called <strong>partitions</strong>. These partitions are stored separately but are part of the same database or storage system. Partitioning can improve performance, manageability, and scalability by reducing the size of data that needs to be handled by any single operation.</p>
<h3 id="heading-types-of-partitioning">Types of Partitioning</h3>
<ol>
<li><p><strong>Horizontal Partitioning:</strong></p>
<ul>
<li><p>Data is split by rows.</p>
</li>
<li><p>Each partition contains a subset of the rows, often based on a range or a key.</p>
</li>
<li><p>Example: Splitting user data based on user IDs (e.g., 1–1000 in Partition A, 1001–2000 in Partition B).</p>
</li>
</ul>
</li>
<li><p><strong>Vertical Partitioning:</strong></p>
<ul>
<li><p>Data is split by columns.</p>
</li>
<li><p>Different partitions store subsets of the attributes (columns).</p>
</li>
<li><p>Example: Separating frequently accessed columns into one table and less-used columns into another.</p>
</li>
</ul>
</li>
<li><p><strong>List Partitioning:</strong></p>
<ul>
<li><p>Data is partitioned based on a list of values.</p>
</li>
<li><p>Example: Orders partitioned by regions, like <code>North</code>, <code>South</code>, <code>East</code>, and <code>West</code>.</p>
</li>
</ul>
</li>
<li><p><strong>Hash Partitioning:</strong></p>
<ul>
<li><p>A hash function determines the partition for each data entry.</p>
</li>
<li><p>Example: Using a hash of the user ID modulo the number of partitions.</p>
</li>
</ul>
</li>
</ol>
<hr />
<h2 id="heading-what-is-sharding">What is Sharding?</h2>
<p>Sharding is a subset of partitioning that involves distributing data across multiple <strong>independent databases or nodes</strong>. Each shard is a self-contained unit with its own database instance, enabling horizontal scaling and fault isolation.</p>
<h3 id="heading-key-characteristics-of-sharding">Key Characteristics of Sharding</h3>
<ol>
<li><p><strong>Independent Databases:</strong></p>
<ul>
<li><p>Each shard operates as a standalone database with its own schema and storage.</p>
</li>
<li><p>Example: Shard 1 might store data for users with IDs 1–1000, while Shard 2 handles IDs 1001–2000.</p>
</li>
</ul>
</li>
<li><p><strong>Scalability:</strong></p>
<ul>
<li>Sharding allows the system to scale out by adding more nodes as the dataset grows.</li>
</ul>
</li>
<li><p><strong>Fault Isolation:</strong></p>
<ul>
<li>Issues in one shard (e.g., hardware failure) do not directly impact other shards.</li>
</ul>
</li>
<li><p><strong>Custom Shard Keys:</strong></p>
<ul>
<li>The shard key determines how data is distributed across shards. A poorly chosen shard key can lead to uneven distribution and hotspots.</li>
</ul>
</li>
</ol>
<hr />
<h2 id="heading-key-differences-between-partitioning-and-sharding">Key Differences Between Partitioning and Sharding</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Aspect</strong></td><td><strong>Partitioning</strong></td><td><strong>Sharding</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Scope</strong></td><td>Divides data within a single database instance.</td><td>Distributes data across multiple databases.</td></tr>
<tr>
<td><strong>Complexity</strong></td><td>Easier to implement and manage.</td><td>More complex, especially with distributed systems.</td></tr>
<tr>
<td><strong>Scaling</strong></td><td>Vertical scaling (limited by a single instance).</td><td>Horizontal scaling (adding more nodes).</td></tr>
<tr>
<td><strong>Fault Isolation</strong></td><td>Single point of failure in the database instance.</td><td>Isolated faults due to independent shards.</td></tr>
<tr>
<td><strong>Performance</strong></td><td>Limited by the capacity of one database.</td><td>Scales with the number of shards.</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-choosing-between-partitioning-and-sharding">Choosing Between Partitioning and Sharding</h2>
<p>When deciding between partitioning and sharding, consider the following:</p>
<ol>
<li><p><strong>Dataset Size:</strong></p>
<ul>
<li><p>Use partitioning if your dataset can fit within a single database instance but needs optimization.</p>
</li>
<li><p>Use sharding if your dataset is too large for a single instance.</p>
</li>
</ul>
</li>
<li><p><strong>Scaling Needs:</strong></p>
<ul>
<li>If you anticipate significant growth, sharding offers better horizontal scalability.</li>
</ul>
</li>
<li><p><strong>Complexity vs. Benefits:</strong></p>
<ul>
<li><p>Partitioning is simpler but limited in scalability.</p>
</li>
<li><p>Sharding requires more effort but enables handling massive datasets.</p>
</li>
</ul>
</li>
<li><p><strong>Fault Tolerance:</strong></p>
<ul>
<li>If fault isolation is crucial, sharding is the better choice.</li>
</ul>
</li>
</ol>
<hr />
<h2 id="heading-real-world-examples">Real-World Examples</h2>
<ol>
<li><p><strong>Partitioning:</strong></p>
<ul>
<li>A retail application partitions order data by year to speed up queries for recent transactions.</li>
</ul>
</li>
<li><p><strong>Sharding:</strong></p>
<ul>
<li>A social media platform shards user data by user ID to ensure that no single database becomes a bottleneck.</li>
</ul>
</li>
</ol>
<hr />
<h2 id="heading-conclusion">Conclusion</h2>
<p>Partitioning and sharding are essential techniques for building scalable, high-performance systems. While partitioning focuses on dividing data within a single database, sharding takes it a step further by distributing data across multiple databases. Choosing the right approach depends on your system’s size, scaling needs, and complexity tolerance.</p>
<p>Understanding these techniques and their trade-offs will help you design robust systems that can handle growth efficiently.</p>
]]></content:encoded></item><item><title><![CDATA[An Overview of Basic, JWT, API Key, and OAuth Authentication Techniques]]></title><description><![CDATA[In the world of distributed systems and modern APIs, authentication plays a critical role in securing resources and validating users. Choosing the right authentication method depends on use cases, system architecture, and security requirements. This ...]]></description><link>https://anishratnawat.com/an-overview-of-basic-jwt-api-key-and-oauth-authentication-techniques</link><guid isPermaLink="true">https://anishratnawat.com/an-overview-of-basic-jwt-api-key-and-oauth-authentication-techniques</guid><category><![CDATA[authentication]]></category><category><![CDATA[API Key authentication]]></category><category><![CDATA[basic authentication]]></category><category><![CDATA[JWT token,JSON Web,Token,Token authentication,Access token,JSON token,JWT security,JWT authentication,Token-based authentication,JWT decoding,JWT implementation]]></category><category><![CDATA[JWT]]></category><category><![CDATA[OAuth2]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 17 Feb 2024 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1734851810724/36681355-4df9-40c2-9830-e89bb2bcee91.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the world of distributed systems and modern APIs, authentication plays a critical role in securing resources and validating users. Choosing the right authentication method depends on use cases, system architecture, and security requirements. This blog explores five popular authentication methods: <strong>Basic Authentication, JWT (JSON Web Tokens), API Keys</strong>, and <strong>OAuth</strong>, along with their use cases, pros, and cons.</p>
<hr />
<h2 id="heading-1-basic-authentication">1. <strong>Basic Authentication</strong></h2>
<h3 id="heading-overview">Overview</h3>
<p>Basic Authentication is a simple way to verify users in REST APIs by sending a <strong><em>username and password in HTTP headers</em></strong>. It’s easy to use but less secure, especially without HTTPS, making it unsuitable for sensitive data or production use.</p>
<p>Here's a quick summary of Basic Authentication in REST APIs:</p>
<ol>
<li><p><strong>Client Request:</strong> The client sends a request to the server with authentication details in the request headers.</p>
</li>
<li><p><strong>Encoding:</strong> Username and password are combined as <code>username:password</code> and base64-encoded. Note: This is not encryption.</p>
</li>
<li><p><strong>Header:</strong> The encoded credentials are added to the HTTP request header like this:</p>
</li>
</ol>
<pre><code class="lang-http"><span class="hljs-attribute">Authorization</span>: Basic base64(username:password)
</code></pre>
<ol start="4">
<li><p><strong>Server Check:</strong> The server decodes the header, retrieves the username and password, and verifies them.</p>
</li>
<li><p><strong>Response:</strong> If valid, the server processes the request. If not, it returns a 401 Unauthorized error.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734851397337/06031e9f-03b1-4ad2-8393-3841f5782387.png" alt class="image--center mx-auto" /></p>
<p>It’s important to use HTTPS when implementing Basic Authentication to encrypt the communication between the client and the server. The credentials are sent in plain text without encryption, making it vulnerable to threats.</p>
<h3 id="heading-use-cases">Use Cases</h3>
<ul>
<li><p>Legacy systems or internal applications.</p>
</li>
<li><p>Simple APIs with low security concerns.</p>
</li>
</ul>
<h3 id="heading-pros">Pros</h3>
<ul>
<li><p>Simple to implement and use.</p>
</li>
<li><p>Supported by all major HTTP clients and browsers.</p>
</li>
</ul>
<h3 id="heading-cons">Cons</h3>
<ul>
<li><p>Credentials are sent with every request, increasing exposure risk.</p>
</li>
<li><p>Base64 encoding is not encryption so increasing security risk. Needs TLS for security.</p>
</li>
<li><p>No session management—every request re-sends credentials.</p>
</li>
</ul>
<h3 id="heading-real-life-example">Real-Life Example</h3>
<ul>
<li>Accessing internal tools or staging environments using a browser pop-up prompt.</li>
</ul>
<hr />
<h2 id="heading-2-token-based-authentication">2. <strong>Token Based Authentication</strong></h2>
<h3 id="heading-overview-1">Overview</h3>
<p>Token authentication is more secure than basic authentication since it involves using a <strong><em>unique token generated for each user</em></strong>. <strong>JSON Web Tokens (JWT)</strong> is a popular token-based authentication method.</p>
<p>JWTs are self-contained and can store user information, reducing the need for constant database queries. This token is sent with each request to authenticate the user. Token authentication is also a good choice for applications requiring frequent authentication, such as single-page or mobile applications.</p>
<p>Since the authentication process does not require user passwords in each request, once a user enters the credentials, receive a <strong><em>unique encrypted token</em></strong> valid for a specified <strong><em>session time</em></strong>.it is more efficient and can handle more concurrent requests.</p>
<p>JWT (JSON Web Token) authentication works in a client-server interaction:</p>
<p><strong>1. User Login</strong></p>
<ul>
<li><p>The client sends login credentials (username and password) to the server.</p>
</li>
<li><p>The server verifies the credentials.</p>
</li>
</ul>
<p><strong>2. Token Generation</strong></p>
<ul>
<li><p>Upon successful authentication, the server generates a JWT.</p>
</li>
<li><p>The token contains:</p>
<ul>
<li><p><strong>Header</strong>: Specifies the token type (<code>JWT</code>) and signing algorithm.</p>
</li>
<li><p><strong>Payload</strong>: Contains user data (e.g., user ID) and claims.</p>
</li>
<li><p><strong>Signature</strong>: Ensures the token’s integrity using a secret key.</p>
</li>
</ul>
</li>
</ul>
<p>In a serialised form, JWT represents a string in the following format:</p>
<pre><code class="lang-http">[header].[payload].[signature]
</code></pre>
<p>Actual JWT token looks like</p>
<p><code>eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6MTIzNDU2Nzg5LCJuYW1lIjoiSm9zZXBoIn0.OpOSSw7e485LOP5PrzScxHb7SR6sAOMRckfFwi4rp7o</code></p>
<p>In deserialised form, JWT will be as per below:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"header"</span>: { <span class="hljs-attr">"alg"</span>: <span class="hljs-string">"HS256"</span>, <span class="hljs-attr">"typ"</span>: <span class="hljs-string">"JWT"</span> },
  <span class="hljs-attr">"payload"</span>: { <span class="hljs-attr">"sub"</span>: <span class="hljs-string">"1234567890"</span>, <span class="hljs-attr">"name"</span>: <span class="hljs-string">"John Doe"</span>, <span class="hljs-attr">"admin"</span>: <span class="hljs-literal">true</span> },
  <span class="hljs-attr">"signature"</span>: <span class="hljs-string">"signed-data"</span>
}
</code></pre>
<p><strong>3. Token Delivery</strong></p>
<ul>
<li>The server sends the JWT to the client (often in the response body or a cookie).</li>
</ul>
<p><strong>4. Token Usage</strong></p>
<ul>
<li><p>The client includes the JWT in the <code>Authorization</code> header (e.g., <code>Bearer &lt;token&gt;</code>) of subsequent requests.</p>
</li>
<li><p>The token serves as proof of authentication.</p>
</li>
</ul>
<p>Request Header has below:</p>
<pre><code class="lang-http"><span class="hljs-attribute">Authorization</span>: Bearer &lt;token&gt;
</code></pre>
<p><strong>5. Token Validation</strong></p>
<ul>
<li><p>The server validates the JWT by:</p>
<ul>
<li><p>Verifying the signature.</p>
</li>
<li><p>Checking the token’s expiration and claims.</p>
</li>
</ul>
</li>
</ul>
<p><strong>6. Access Granted</strong></p>
<ul>
<li><p>If the token is valid, the server processes the request and returns the response.</p>
</li>
<li><p>If invalid, the server denies access, often returning a <code>401 Unauthorized</code> status.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734851228489/fb01cb61-5f48-4487-8deb-13a6be149d0e.png" alt class="image--center mx-auto" /></p>
<p><strong>Key Points:</strong></p>
<ul>
<li><p>JWTs are <strong>stateless</strong>, meaning the server doesn’t store session information.</p>
</li>
<li><p>Expired or invalid tokens require re-authentication.</p>
</li>
<li><p>Tokens can include additional claims for granular access control.</p>
</li>
</ul>
<h3 id="heading-use-cases-1">Use Cases</h3>
<ul>
<li><p>Short-lived API sessions.</p>
</li>
<li><p>Microservices communication.</p>
</li>
<li><p>Stateless authentication for web and mobile applications.</p>
</li>
<li><p>Single Sign-On (SSO) scenarios.</p>
</li>
</ul>
<h3 id="heading-pros-1">Pros</h3>
<ul>
<li><p>Self-contained and can carry user claims.</p>
</li>
<li><p>Stateless: No need to query the database after token issuance.</p>
</li>
<li><p>Supports token expiration and custom claims.</p>
</li>
</ul>
<h3 id="heading-cons-1">Cons</h3>
<ul>
<li><p>Larger token size compared to others.</p>
</li>
<li><p>If not properly invalidated, compromised tokens remain valid until expiry.</p>
</li>
<li><p>Need secure storage to avoid leakage.</p>
</li>
</ul>
<h3 id="heading-real-life-example-1">Real-Life Example</h3>
<ul>
<li>Accessing APIs in cloud platforms (e.g., AWS, Azure) and microservice architecture.</li>
</ul>
<hr />
<h2 id="heading-3-api-key-authentication">3. <strong>API Key Authentication</strong></h2>
<h3 id="heading-overview-2">Overview</h3>
<p>API keys are unique strings assigned to each client, included in requests to identify the client. They can be passed via headers, query parameters, or request bodies.</p>
<ol>
<li><p><strong>Obtaining API Key:</strong> Clients request an API key from the API provider. This is usually done through a developer portal or some registration process.</p>
</li>
<li><p><strong>Including API Key in Requests:</strong> Once the API key is obtained, it must be included in each API request.</p>
<pre><code class="lang-http"> GET /api/resource?api_key=123abc
</code></pre>
</li>
<li><p><strong>Server-Side Validation:</strong> The API server receives the request and extracts the API key from the specified location (URL parameter, header, etc.). The server checks the validity of the API key by comparing it against a list of authorized keys stored in its database.</p>
</li>
<li><p><strong>Authorization Check:</strong> Once the API key is validated, the server checks if the associated client or application has the necessary permissions to perform the requested action.</p>
</li>
</ol>
<p>While API keys are a straightforward authentication method, they have some limitations. One of the main concerns is that API keys can be exposed easily if not handled securely, leading to potential security risks. Therefore, following best practices, such as using HTTPS, avoiding exposure of keys in client-side code, and implementing proper critical management practices, is essential. Other authentication methods, like token-based authentication (JWTs), may be preferred for more sensitive applications.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734850721518/aa708cd3-80ea-45a1-9df4-ced446bbcc6a.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-use-cases-2">Use Cases</h3>
<ul>
<li><p>Public APIs with limited scope access.</p>
</li>
<li><p>Service-to-service communication.</p>
</li>
</ul>
<h3 id="heading-pros-2">Pros</h3>
<ul>
<li><p>Simple and easy to use.</p>
</li>
<li><p>Can be restricted by IP or referrer for added security.</p>
</li>
</ul>
<h3 id="heading-cons-2">Cons</h3>
<ul>
<li><p>No built-in user identity (just a key).</p>
</li>
<li><p>Vulnerable to theft if exposed in public repositories or URLs.</p>
</li>
<li><p>Hard to revoke individual keys unless tracked explicitly.</p>
</li>
</ul>
<h3 id="heading-real-life-example-2">Real-Life Example</h3>
<ul>
<li>APIs for weather, currency conversion, or third-party integration services.</li>
</ul>
<hr />
<h2 id="heading-5-oauth-20">5. <strong>OAuth 2.0</strong></h2>
<h3 id="heading-overview-3">Overview</h3>
<p>Applications often need to interact with each other on behalf of users. Whether it’s granting access to your Google account for a new app or allowing a third-party service to post to your social media, <strong>OAuth 2.0</strong> has become the go-to solution for secure, seamless access delegation.</p>
<p>In this blog, we’ll explore the core concepts of OAuth 2.0, why it’s essential, and how it enables secure interactions between applications.</p>
<h3 id="heading-what-is-oauth-20">What is OAuth 2.0?</h3>
<p>OAuth 2.0 (Open Authorization 2.0) is an <strong><em>open standard protocol</em></strong> designed to provide secure authorization for applications without exposing user credentials. It allows users to grant limited access to third-party application.</p>
<p>For example, when you sign in to an app using your Google account, OAuth 2.0 facilitates the process, ensuring the app gets access to your profile details without revealing your password.</p>
<h3 id="heading-key-concepts-in-oauth-20">Key Concepts in OAuth 2.0</h3>
<p>OAuth 2.0 operates through several key roles:</p>
<ul>
<li><p><strong>Resource Owner (User):</strong> The individual who owns the data and can grant access to it.</p>
</li>
<li><p><strong>Client (Application):</strong> The app requesting access to the user’s data.</p>
</li>
<li><p><strong>Authorization Server:</strong> The server that authenticates the user and grants tokens.</p>
</li>
<li><p><strong>Resource Server:</strong> The server that holds the user’s data and validates tokens for access.</p>
</li>
</ul>
<h3 id="heading-token-based-access">Token-Based Access</h3>
<p>OAuth 2.0 uses tokens instead of credentials to grant access. These tokens are temporary and can be tailored for specific permissions, making them more secure and flexible.</p>
<p>There are roughly 2 types of token:</p>
<ul>
<li><p><strong>Access Token</strong>: Used to access protected resources.</p>
</li>
<li><p><strong>Refresh Token</strong>: Used to obtain a new access token when the current one expires.</p>
</li>
</ul>
<h3 id="heading-common-oauth-20-grant-flows"><strong>Common OAuth 2.0 Grant Flows</strong></h3>
<ol>
<li><p><strong>Authorization Code Flow</strong> (most secure, used for server-side applications):</p>
<ul>
<li><p><strong>Authorization Request</strong>: The client redirects the user to the authorization server to log in and approve access.</p>
</li>
<li><p><strong>Authorization Grant</strong>: The authorization server provides an authorization code.</p>
</li>
<li><p><strong>Token Request</strong>: The client exchanges the authorization code for an access token by making a back-channel request.</p>
</li>
<li><p><strong>Resource Request</strong>: The client uses the access token to access the protected resource.</p>
</li>
</ul>
</li>
</ol>
<p>User request for login and authenticate to get auth code</p>
<p><a target="_blank" href="https://localhost:8080/realms/myrealm/protocol/openid-connect/auth"><code>https://localhost:8080/realms/myrealm/protocol/openid-connect/auth</code></a> <code>?response_type=code &amp;client_id=myclient &amp;redirect_uri=</code><a target="_blank" href="http://localhost:3000/callback"><code>http://localhost:3000/callback</code></a> <code>&amp;scope=openid</code></p>
<p>After successful authentication, auth server redirect the user to the redirect uri with authorization code</p>
<p><a target="_blank" href="http://localhost:3000/callback?code=AUTHORIZATION_CODE">http://localhost:3000/callback?code=AUTHORIZATION_CODE</a></p>
<p>Once we get authorization code from the url</p>
<p>One can do below curl to get token</p>
<p><code>curl -X POST "</code><a target="_blank" href="http://localhost:8080/realms/myrealm/protocol/openid-connect/token"><code>http://localhost:8080/realms/myrealm/protocol/openid-connect/token</code></a><code>"   -H "Content-Type: application/x-www-form-urlencoded"   -d "grant_type=authorization_code"   -d "client_id=myclient"   -d "client_secret=mysecret"   -d "redirect_uri=</code><a target="_blank" href="http://localhost:3000/callback"><code>http://localhost:3000/callback</code></a><code>"   -d "code=AUTHORIZATION_CODE"</code></p>
<p>Response</p>
<p><code>{ "access_token": "eyJhbGciOiJSUzI1NiIsInR5c...", "expires_in": 300, "refresh_token": "eyJhbGciOiJIUzI1NiIs...", "id_token": "eyJhbGciOiJSUzI1NiIsInR5c...", "token_type": "Bearer" }</code></p>
<ol start="2">
<li><strong>Implicit Flow</strong> - deprecated (used for single-page applications):</li>
</ol>
<ul>
<li>The client directly receives the access token without an authorization code exchange.</li>
</ul>
<ol start="3">
<li><strong>Client Credentials Flow</strong> (used for machine-to-machine communication):</li>
</ol>
<ul>
<li><p>The client authenticates itself to the authorization server and directly obtains an access token.</p>
</li>
<li><p>It doesn’t require any callback url and auth code exchange. Authorization code flow requires callback url. Auth server returns JWT token.</p>
</li>
<li><p>User will pass clientId and client secret for authentication server.</p>
</li>
</ul>
<p><code>curl -X POST "</code><a target="_blank" href="https://auth.example.com/oauth/token"><code>https://auth.example.com/oauth/token</code></a><code>"   -H "Content-Type: application/x-www-form-urlencoded"   -d "client_id=myclient"   -d "client_secret=mysecret"   -d "grant_type=client_credentials"</code></p>
<p>Response</p>
<p><code>{ "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI...", "expires_in": 300, "token_type": "Bearer" }</code></p>
<p>It doesn’t provide refresh token in cc flow.</p>
<ol start="4">
<li><strong>Resource Owner Password Credentials Flow</strong> (discouraged due to security concerns):</li>
</ol>
<ul>
<li>The client directly collects the user’s credentials and exchanges them for an access token.</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734850283241/404624c2-37ea-4bd7-9ad1-88ca01ab876a.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-common-oauth-20-use-cases">Common OAuth 2.0 Use Cases</h3>
<ul>
<li><p><strong>Social Media Integration:</strong> Allowing apps to post on behalf of users.</p>
</li>
<li><p><strong>Cloud Storage Access:</strong> Enabling apps to fetch files from services like Google Drive or Dropbox.</p>
</li>
<li><p><strong>Payment Gateways:</strong> Granting access to payment platforms without sharing sensitive information.</p>
</li>
</ul>
<h3 id="heading-real-life-a-step-by-step-example">Real life : A Step-by-Step Example</h3>
<p>Let’s say you want to use a third-party app to analyze your Gmail data:</p>
<ol>
<li><p><strong>Authorization Request:</strong> The app redirects you to Google’s authorization server.</p>
</li>
<li><p><strong>User Consent:</strong> You log in and grant permission.</p>
</li>
<li><p><strong>Token Issuance:</strong> Google provides an access token to the app.</p>
</li>
<li><p><strong>Data Access:</strong> The app uses the token to fetch your Gmail data securely.</p>
</li>
</ol>
<h3 id="heading-pros-3"><strong>Pros</strong></h3>
<ul>
<li><p>Securely avoids sharing user credentials with third-party apps.</p>
</li>
<li><p>Supports Single Sign-On (SSO) for seamless user experience.</p>
</li>
<li><p>Provides granular access control through token scopes.</p>
</li>
<li><p>Offers token expiration and refresh for enhanced security.</p>
</li>
<li><p>Flexible for diverse use cases (e.g., mobile, web, API).</p>
</li>
<li><p>Widely adopted and integrated with major platforms.</p>
</li>
</ul>
<h3 id="heading-cons-3"><strong>Cons</strong></h3>
<ul>
<li><p>Complex implementation increases the risk of errors.</p>
</li>
<li><p>Token storage and management require careful handling.</p>
</li>
<li><p>Implicit flow deprecation impacts older implementations.</p>
</li>
<li><p>Reliance on third-party providers may raise privacy concerns.</p>
</li>
</ul>
<hr />
<h2 id="heading-choosing-the-right-authentication-method"><strong>Choosing the Right Authentication Method</strong></h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Authentication</strong></td><td><strong>Best for</strong></td><td><strong>Security</strong></td><td><strong>Ease of Implementation</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Basic</td><td>Simple apps</td><td>Low</td><td>High</td></tr>
<tr>
<td>JWT</td><td>SPAs, SSO</td><td>High</td><td>Moderate</td></tr>
<tr>
<td>API Key</td><td>Public APIs</td><td>Low</td><td>High</td></tr>
<tr>
<td>OAuth 2.0</td><td>Third-party access</td><td>Very High</td><td>Low</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Each authentication method has its own strengths and trade-offs. For simple use cases, <strong>API Keys</strong> or <strong>Basic Auth</strong> may suffice. For stateless, scalable systems, <strong>JWT</strong> is a strong choice. For delegated access and federated identity, <strong>OAuth 2.0</strong> is a robust option. Evaluate your security needs and system architecture before choosing the most appropriate solution.</p>
<p>Have any questions or insights about these methods? Let’s discuss in the comments! 👇</p>
]]></content:encoded></item><item><title><![CDATA[Understanding WebSockets: A Beginner’s Guide]]></title><description><![CDATA[WebSockets are a modern technology that makes real-time communication over the internet fast and efficient. Let’s break it down and explore why WebSockets are essential, how they work, and how they differ from traditional protocols like HTTP.
What Ar...]]></description><link>https://anishratnawat.com/understanding-websockets-a-beginners-guide</link><guid isPermaLink="true">https://anishratnawat.com/understanding-websockets-a-beginners-guide</guid><category><![CDATA[websockets]]></category><category><![CDATA[http]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Sat, 20 Jan 2024 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1734605944404/1716d668-15da-42ea-ab25-8eacc98b692f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>WebSockets are a modern technology that makes real-time communication over the internet fast and efficient. Let’s break it down and explore why WebSockets are essential, how they work, and how they differ from traditional protocols like HTTP.</p>
<h3 id="heading-what-are-websockets">What Are WebSockets?</h3>
<p>WebSockets are a communication protocol that allows for full-duplex (two-way) communication between a client (like a web browser) and a server. This means both the client and the server can send and receive messages at any time without waiting for the other to finish.</p>
<hr />
<h3 id="heading-why-are-websockets-used">Why Are WebSockets Used?</h3>
<p>WebSockets are increasingly popular because they address the limitations of older protocols like HTTP for real-time communication. Here are the main reasons for their usage:</p>
<ul>
<li><p><strong>Real-Time Data Exchange</strong>: Ideal for applications requiring instant updates, such as chat apps, live sports scores, or stock market tracking.</p>
</li>
<li><p><strong>Reduced Latency</strong>: Unlike HTTP, WebSockets keep the connection open, reducing the delay caused by repeatedly opening and closing connections.</p>
</li>
<li><p><strong>Two-Way Communication</strong>: Enables seamless interaction, such as collaborative editing in documents or multiplayer online games.</p>
</li>
<li><p><strong>Scalable Architecture</strong>: Efficient resource use makes WebSockets suitable for large-scale real-time applications.</p>
</li>
</ul>
<hr />
<h3 id="heading-how-do-websockets-work">How Do WebSockets Work?</h3>
<p>Here’s a step-by-step explanation of how WebSockets operate:</p>
<ol>
<li><p><strong>Handshake</strong>:</p>
<ul>
<li><p>The client initiates a connection with an HTTP request with a special header (“Upgrade”).</p>
</li>
<li><p>The server responds with <strong>status 101</strong>, if agreeing to switch the connection from HTTP to WebSocket. It responds with some error code if server doesn’t support webSocket upgrade.</p>
</li>
</ul>
</li>
</ol>
<p>Client request header look like this:</p>
<pre><code class="lang-http"><span class="hljs-keyword">GET</span> <span class="hljs-string">/chat</span> HTTP/1.1
<span class="hljs-attribute">Host</span>: example.com
<span class="hljs-attribute">Upgrade</span>: websocket
<span class="hljs-attribute">Connection</span>: Upgrade
<span class="hljs-attribute">Sec-WebSocket-Key</span>: x3JJHMbDL1EzLkh9GBhXDw==
<span class="hljs-attribute">Sec-WebSocket-Version</span>: 13
</code></pre>
<p>upgrade field has been set to websocket and connection field has been set to upgrade, which denotes that client is requesting a websocket connection over http. <strong>sec-websocket-key</strong> is the base64 encoded value that is generated randomly.</p>
<p>Server will receive the request, read the websocket-key and combine with globally unique identifier and do SHA-1 hash of concatenated string. This hash value will be part of field <strong>sec-websocket-Accept</strong> if it willing to accept the connection.</p>
<p>Server will send the below headers in the form of handshake to the client.</p>
<pre><code class="lang-http">HTTP/1.1 <span class="hljs-number">101</span> Switching Protocols
<span class="hljs-attribute">Upgrade</span>: websocket
<span class="hljs-attribute">Connection</span>: Upgrade
<span class="hljs-attribute">Sec-WebSocket-Accept</span>: HSmrc0sMlYUkAGmm5OPpG2HaGWk=
</code></pre>
<ol start="2">
<li><p><strong>Connection Established</strong>:</p>
<ul>
<li>Once the handshake is complete, the connection stays open.</li>
</ul>
</li>
<li><p><strong>Real-Time Communication</strong>:</p>
<ul>
<li><p>Both the client and server can send messages to each other anytime.</p>
</li>
<li><p>Messages are exchanged using lightweight frames, ensuring efficiency.</p>
</li>
</ul>
</li>
<li><p><strong>Connection Closure</strong>:</p>
<ul>
<li>Either the client or the server can close the connection when the communication is complete.</li>
</ul>
</li>
</ol>
<p><strong>WebSocket has a default URI format</strong></p>
<pre><code class="lang-http"><span class="hljs-attribute">ws-URI =  "ws:" "//" host [ ":" port ] path [ "?" query ]
wss-URI = "wss:" "//" host [ ":" port ] path [ "?" query ]</span>
</code></pre>
<p>wss denotes a secure web-socket connection with TLS handshake. Separate port will be used on the client and the server side, decided by client and server.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1734604907483/18444df0-b0d6-44fc-9e95-2d7b407b92a2.png" alt class="image--center mx-auto" /></p>
<hr />
<h3 id="heading-how-websocket-different-from-http">How WebSocket different from HTTP</h3>
<p>WebSockets and HTTP may seem similar since they both run over TCP, but their behavior is fundamentally different.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td><strong>WebSockets</strong></td><td><strong>HTTP</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Connection</strong></td><td>Persistent (open for long-term)</td><td>Stateless (one request-response cycle)</td></tr>
<tr>
<td><strong>Communication</strong></td><td>Full-duplex (two-way)</td><td>Half-duplex (client initiates)</td></tr>
<tr>
<td><strong>Overhead</strong></td><td>Minimal (after handshake)</td><td>High (headers for every request)</td></tr>
<tr>
<td><strong>Real-Time Suitability</strong></td><td>Excellent</td><td>Requires polling/long-polling</td></tr>
</tbody>
</table>
</div><p>The web-socket protocol is an TCP based protocol. It’s only relationship to HTTP is that its handshake is interpreted by HTTP servers as an upgrade request.</p>
<hr />
<h3 id="heading-when-is-http-preferred-over-websockets">When Is HTTP Preferred Over WebSockets?</h3>
<p>While WebSockets excel in real-time communication scenarios, HTTP is still preferred in many situations due to its simplicity and widespread use. Here are some cases where HTTP is a better choice:</p>
<ul>
<li><p><strong>Static Content Delivery</strong>:</p>
<ul>
<li>For serving static assets like HTML, CSS, JavaScript, or images, HTTP is more straightforward and efficient.</li>
</ul>
</li>
<li><p><strong>Simple Request-Response</strong>:</p>
<ul>
<li>For operations like form submissions, API calls, or fetching data where one request yields one response, HTTP is sufficient.</li>
</ul>
</li>
<li><p><strong>Short-Lived Connections</strong>:</p>
<ul>
<li>For actions that do not require a persistent connection, such as loading a webpage or making occasional API requests.</li>
</ul>
</li>
<li><p><strong>Browser and Server Compatibility</strong>:</p>
<ul>
<li>HTTP is universally supported and works seamlessly with all browsers, servers, and proxies.</li>
</ul>
</li>
<li><p><strong>Security and Caching</strong>:</p>
<ul>
<li>HTTP benefits from established security protocols and caching mechanisms, making it ideal for delivering resources efficiently.</li>
</ul>
</li>
<li><p><strong>Lower Complexity</strong>:</p>
<ul>
<li>HTTP does not require the additional implementation effort needed for managing WebSocket connections and messages.</li>
</ul>
</li>
</ul>
<hr />
<h3 id="heading-conclusion">Conclusion</h3>
<p>WebSockets are a game-changer for applications that demand real-time communication. By enabling persistent and efficient two-way communication, WebSockets reduce latency and overhead, making them a go-to choice for modern web and mobile applications. While they might not replace HTTP entirely, they complement it by addressing specific use cases like live updates and interactivity. As real-time applications grow, understanding and leveraging WebSockets can be a significant advantage for developers.</p>
]]></content:encoded></item><item><title><![CDATA[HTTP/1.1 vs. HTTP/2 vs. HTTP/3: A Comprehensive Comparison, Limitations, and Adoption Trends]]></title><description><![CDATA[The Hypertext Transfer Protocol (HTTP) is the backbone of the web, evolving over decades to address performance, scalability, and security needs. Each version brought significant changes to overcome the limitations of its predecessor. This blog delve...]]></description><link>https://anishratnawat.com/http11-vs-http2-vs-http3-a-comprehensive-comparison-limitations-and-adoption-trends</link><guid isPermaLink="true">https://anishratnawat.com/http11-vs-http2-vs-http3-a-comprehensive-comparison-limitations-and-adoption-trends</guid><category><![CDATA[http1]]></category><category><![CDATA[http]]></category><category><![CDATA[http2]]></category><category><![CDATA[http3]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Fri, 29 Dec 2023 18:30:00 GMT</pubDate><content:encoded><![CDATA[<p>The Hypertext Transfer Protocol (HTTP) is the backbone of the web, evolving over decades to address performance, scalability, and security needs. Each version brought significant changes to overcome the limitations of its predecessor. This blog delves into the key differences between HTTP/1.1, HTTP/2, and HTTP/3, their limitations, and their adoption trends.</p>
<hr />
<h2 id="heading-1-http11-the-foundation"><strong>1. HTTP/1.1: The Foundation</strong></h2>
<p>In HTTP/1.0, every request to the same server requires separate TCP connection. HTTP/1.1 which was improvement over HTTP/1.0 introduces persistent connection and reuses the connection.</p>
<h3 id="heading-key-features"><strong>Key Features</strong></h3>
<ul>
<li><p><strong>Persistent Connections</strong>: Keeps the connection open for multiple requests, reducing TCP handshake overhead.</p>
</li>
<li><p><strong>Pipelining (Theoretical)</strong>: Allows multiple requests to be sent without waiting for responses, though rarely used due to head-of-line blocking.</p>
</li>
<li><p><strong>Caching Enhancements</strong>: Improved caching with headers like <code>ETag</code> and <code>Cache-Control</code>.</p>
</li>
</ul>
<h3 id="heading-limitations"><strong>Limitations</strong></h3>
<ul>
<li><p><strong>Head-of-Line (HOL) Blocking</strong>: Each HTTP/1.x connection could handle only one request at a time. This limitation often led to inefficient use of network resources, as subsequent requests had to wait for the previous request to complete.</p>
</li>
<li><p><strong>Lack of prioritization:</strong> HTTP/1.x did not offer a way to prfioritize requests, which could lead to less critical resources blocking more important ones.</p>
</li>
</ul>
<p>All of these issues have a large performance impact, especially on the modern web.</p>
<hr />
<h2 id="heading-2-http2-multiplexing-for-the-modern-web"><strong>2. HTTP/2: Multiplexing for the Modern Web</strong></h2>
<p>HTTP/2, released in 2015, introduced major improvements in performance and efficiency. It solves the major issues of HTTP/1.x.</p>
<h3 id="heading-key-features-1"><strong>Key Features</strong></h3>
<ul>
<li><p><strong>Multiplexing</strong>: Solves HOL issue and allows multiple requests and responses to be sent simultaneously over a single TCP connection.</p>
</li>
<li><p><strong>Header Compression (HPACK)</strong>: Compresses headers to reduce overhead.</p>
</li>
<li><p><strong>Stream Prioritization</strong>: Enables prioritization of critical resources for faster page loading.</p>
</li>
<li><p><strong>Server Push</strong>: Allows servers to send resources proactively.</p>
</li>
</ul>
<h3 id="heading-limitations-1"><strong>Limitations</strong></h3>
<ul>
<li><p><strong>HOL Blocking at TCP Level</strong>: While HTTP/2 solves HOL blocking at the application level, it still exists at the TCP layer.</p>
</li>
<li><p><strong>Complexity</strong>: Implementing HTTP/2 requires more sophisticated server and client logic.</p>
</li>
</ul>
<p>HTTP/2 relies on the same underlying protocol in order to operate: TCP. This is both a positive and a negative. Because TCP is used by HTTP/1.x already it means adoption is much easier; browsers don't need to implement a new underlying protocol, and servers can continue operating as they are now with a few tweaks to implement the HTTP/2 features. The downside is that there are issues with TCP, especially in high-latency and lossy networks.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733163139477/58266277-94ed-4dfa-b3ad-893bb6430c98.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-3-http3-the-quic-revolution"><strong>3. HTTP/3: The QUIC Revolution</strong></h2>
<p>HTTP/3, finalized in 2022, represents a radical departure from the previous versions by using QUIC instead of TCP. QUIC is a transport protocol built on UDP, designed to reduce latency and improve resilience.</p>
<h3 id="heading-key-features-2"><strong>Key Features</strong></h3>
<ul>
<li><p><strong>QUIC Protocol</strong>: QUIC provides multiplexing without HOL blocking at the transport layer.</p>
</li>
<li><p><strong>Zero Round-Trip Time (0-RTT) Resumption</strong>: Reduces latency for repeat connections.</p>
</li>
<li><p><strong>Connection Migration</strong>: QUIC connections can seamlessly continue across network changes (e.g., switching from Wi-Fi to mobile).</p>
</li>
<li><p><strong>Built-in encryption:</strong> QUIC incorporates Transport Layer Security (TLS) 1.3 by default, ensuring a secure connection without the need for a separate TLS handshake. This reduces latency and improves connection establishment time.</p>
</li>
<li><p><strong>Improved congestion control:</strong> QUIC offers more advanced congestion control mechanisms, allowing it to better adapt to varying network conditions and improve overall performance.</p>
</li>
</ul>
<h3 id="heading-limitations-2"><strong>Limitations</strong></h3>
<ul>
<li><p><strong>UDP Overhead</strong>: QUIC uses UDP, which can be blocked or throttled by some network configurations.</p>
</li>
<li><p><strong>Adoption and Support</strong>: Requires updates to infrastructure (e.g., load balancers, firewalls) to handle QUIC.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733163485884/40149159-87c6-4447-a7ed-a6da7ac06a4b.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-performance-comparison"><strong>Performance Comparison</strong></h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>HTTP/1.1</td><td>HTTP/2</td><td>HTTP/3</td></tr>
</thead>
<tbody>
<tr>
<td>Multiplexing</td><td>❌</td><td>✅ (Application Level)</td><td>✅ (Transport Level)</td></tr>
<tr>
<td>Head-of-Line Blocking</td><td>✅</td><td>✅ (TCP)</td><td>❌</td></tr>
<tr>
<td>Compression</td><td>❌</td><td>✅ (HPACK)</td><td>✅ (QPACK)</td></tr>
<tr>
<td>Connection Establishment</td><td>TCP (3-RTT)</td><td>TCP (3-RTT)</td><td>QUIC (1-RTT or 0-RTT)</td></tr>
<tr>
<td>Encryption</td><td>Optional (TLS)</td><td>Mandatory (TLS 1.2/1.3)</td><td>Built-in (TLS 1.3 in QUIC)</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-adoption-trends"><strong>Adoption Trends</strong></h2>
<ul>
<li><p><strong>HTTP/1.1</strong>: Still widely used, especially in legacy systems.</p>
</li>
<li><p><strong>HTTP/2</strong>: Adoption is strong, supported by most modern browsers and servers. Many CDNs default to HTTP/2.</p>
</li>
<li><p><strong>HTTP/3</strong>: Adoption is growing rapidly, led by major players like Google and Cloudflare. Browser support is robust, but infrastructure adoption is catching up.</p>
</li>
</ul>
<h3 id="heading-adoption-challenges-for-http3"><strong>Adoption Challenges for HTTP/3</strong></h3>
<ul>
<li><p><strong>Infrastructure Compatibility</strong>: Many middleboxes (firewalls, load balancers) need updates to handle QUIC.</p>
</li>
<li><p><strong>UDP Blockage</strong>: Some networks block or deprioritize UDP traffic, limiting QUIC’s effectiveness.</p>
</li>
</ul>
<hr />
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Each version of HTTP addresses critical performance bottlenecks in its predecessor. While HTTP/1.1 laid the groundwork for modern web communication, HTTP/2 improved performance with multiplexing and compression. HTTP/3, with its use of QUIC, represents the future of web protocols by solving head-of-line blocking at the transport level and reducing latency.</p>
<p>As the web continues to evolve, HTTP/3 adoption will likely accelerate, driven by the demand for faster, more resilient connections. However, full adoption will depend on updating network infrastructure and overcoming the challenges associated with UDP-based protocols.</p>
<hr />
<h3 id="heading-further-reading"><strong>Further Reading</strong></h3>
<ul>
<li><p><a target="_blank" href="https://datatracker.ietf.org/doc/html/rfc9114">IETF HTTP/3 Standard</a></p>
</li>
<li><p><a target="_blank" href="https://datatracker.ietf.org/doc/html/rfc9000">QUIC Protocol</a></p>
</li>
<li><p><a target="_blank" href="https://www.chromium.org/quic/">Google’s QUIC Initiative</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Navigating System Design: How Scalability, Reliability, Availability, and Performance Shape Success]]></title><description><![CDATA[When designing modern distributed systems, four key attributes stand out as pillars of success: Scalability, Reliability, Availability, and Performance. Together, these "Fantastic Four" guide architects and engineers to build robust systems capable o...]]></description><link>https://anishratnawat.com/navigating-system-design-how-scalability-reliability-availability-and-performance-shape-success</link><guid isPermaLink="true">https://anishratnawat.com/navigating-system-design-how-scalability-reliability-availability-and-performance-shape-success</guid><category><![CDATA[System Design]]></category><category><![CDATA[Reliability]]></category><category><![CDATA[availability]]></category><category><![CDATA[scalability]]></category><category><![CDATA[performance]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Fri, 08 Dec 2023 18:30:00 GMT</pubDate><content:encoded><![CDATA[<p>When designing modern distributed systems, four key attributes stand out as pillars of success: <strong>Scalability</strong>, <strong>Reliability</strong>, <strong>Availability</strong>, and <strong>Performance</strong>. Together, these "Fantastic Four" guide architects and engineers to build robust systems capable of meeting diverse and demanding requirements. Let’s explore each of these in detail</p>
<hr />
<h3 id="heading-1-scalability-designing-for-growth">1. <strong>Scalability: Designing for Growth</strong></h3>
<p>Scalability refers to a system's ability to handle increased load by adding resources, either horizontally (more machines) or vertically (better machines). A well-scalable system does not degrade performance.</p>
<h4 id="heading-few-key-considerations-for-scalability">Few Key Considerations for scalability:</h4>
<ul>
<li><p><strong>Load Balancing:</strong> Distributing requests across multiple servers to avoid bottlenecks.</p>
</li>
<li><p><strong>Stateless Services:</strong> Stateless design enables easier scaling as each request can be handled independently.</p>
</li>
<li><p><strong>Partitioning/Sharding:</strong> Splitting data across different databases or servers.</p>
</li>
</ul>
<p><strong>Example:</strong><br />Think of an e-commerce platform during a holiday sale. The system must scale to handle millions of requests and transactions simultaneously.</p>
<hr />
<h3 id="heading-2-reliability-building-trust-in-the-system">2. <strong>Reliability: Building Trust in the System</strong></h3>
<p>Reliability ensures that the system performs correctly under expected conditions and gracefully degrades under unexpected conditions. A reliable system minimizes failures and provides consistent results.</p>
<h4 id="heading-techniques-to-enhance-reliability">Techniques to Enhance Reliability:</h4>
<ul>
<li><p><strong>Redundancy:</strong> Duplicating critical components to avoid single points of failure.</p>
</li>
<li><p><strong>Failover Mechanisms:</strong> Automatically switching to backup systems during a failure.</p>
</li>
<li><p><strong>Data Replication:</strong> Keeping multiple copies of data across different nodes or regions.</p>
</li>
</ul>
<p><strong>Example:</strong><br />Payment gateways rely heavily on reliability. Even a minor glitch can lead to financial losses or customer dissatisfaction.</p>
<hr />
<h3 id="heading-3-availability-ensuring-uptime">3. <strong>Availability: Ensuring Uptime</strong></h3>
<p>Availability is about how often a system is operational and accessible. It’s measured by <strong>uptime</strong> percentages. High availability (HA) systems aim for <strong>99.99% uptime</strong> or better.</p>
<h4 id="heading-strategies-to-achieve-high-availability">Strategies to Achieve High Availability:</h4>
<ul>
<li><p><strong>Load Balancers and Health Checks:</strong> Continuously monitor services and route traffic to healthy nodes.</p>
</li>
<li><p><strong>Distributed Systems:</strong> Spreading services across multiple data centers ensures availability even during regional outages.</p>
</li>
<li><p><strong>Graceful Degradation:</strong> Allowing partial functionality when full service is not possible (e.g., read-only mode for a database).</p>
</li>
</ul>
<p><strong>Example:</strong><br />Social media platforms prioritize availability to ensure users can access their accounts at any time, across the globe.</p>
<hr />
<h3 id="heading-4-performance-speed-and-efficiency">4. <strong>Performance: Speed and Efficiency</strong></h3>
<p>Performance measures how fast and efficiently a system processes requests and delivers results. Poor performance can drive users away, regardless of other attributes.</p>
<h4 id="heading-key-performance-metrics">Key Performance Metrics:</h4>
<ul>
<li><p><strong>Latency:</strong> Time taken to process a request.</p>
</li>
<li><p><strong>Throughput:</strong> Number of requests processed per unit time.</p>
</li>
<li><p><strong>Resource Utilization:</strong> CPU, memory, and network bandwidth usage.</p>
</li>
</ul>
<h4 id="heading-performance-optimization-techniques">Performance Optimization Techniques:</h4>
<ul>
<li><p><strong>Caching:</strong> Storing frequently accessed data in memory to reduce response times.</p>
</li>
<li><p><strong>Content Delivery Networks (CDNs):</strong> Distributing static content closer to users.</p>
</li>
<li><p><strong>Asynchronous Processing:</strong> Handling non-critical tasks in the background.</p>
</li>
</ul>
<p><strong>Example:</strong><br />Search engines like Google prioritize performance to return search results in milliseconds, enhancing user experience.</p>
<hr />
<h3 id="heading-final-thoughts">Final Thoughts</h3>
<p>The "Fantastic Four" of system design—scalability, reliability, availability, and performance—are not just buzzwords but essential principles that drive the architecture of modern systems. Mastering these concepts empowers engineers to build systems that not only meet today’s demands but are also prepared for future challenges.</p>
<p>In your next project, consider these pillars as guiding stars to ensure success in the ever-evolving landscape of distributed systems.</p>
]]></content:encoded></item><item><title><![CDATA[Scaling: Horizontal vs Vertical – What You Need to Know]]></title><description><![CDATA[Scaling is a crucial aspect of designing systems that can handle increasing workloads. Whether you're building a distributed system, a web application, or a backend service, choosing the right scaling strategy can significantly impact performance, co...]]></description><link>https://anishratnawat.com/scaling-horizontal-vs-vertical-what-you-need-to-know</link><guid isPermaLink="true">https://anishratnawat.com/scaling-horizontal-vs-vertical-what-you-need-to-know</guid><category><![CDATA[horizontal scaling]]></category><category><![CDATA[vertical scaling]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Mon, 04 Dec 2023 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1733078277456/6febf95d-cfbc-467b-a2dc-db4b664e13f6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Scaling is a crucial aspect of designing systems that can handle increasing workloads. Whether you're building a distributed system, a web application, or a backend service, choosing the right scaling strategy can significantly impact performance, cost, and manageability. In this post, we'll explore two primary scaling strategies: <strong>horizontal scaling</strong> and <strong>vertical scaling</strong>, and help you decide when to use each.</p>
<hr />
<h2 id="heading-what-is-vertical-scaling">What is Vertical Scaling?</h2>
<p>Vertical scaling (also known as <strong>scaling up</strong>) involves adding more resources to a single machine. This can include:</p>
<ul>
<li><p>Adding more <strong>CPU cores</strong></p>
</li>
<li><p>Increasing <strong>RAM</strong></p>
</li>
<li><p>Upgrading to faster <strong>storage</strong> (SSDs)</p>
</li>
</ul>
<h3 id="heading-benefits-of-vertical-scaling"><strong>Benefits of Vertical Scaling:</strong></h3>
<ul>
<li><p><strong>Simplicity:</strong> No need to modify your application architecture.</p>
</li>
<li><p><strong>Quick to implement:</strong> Often requires only hardware upgrades or moving to a larger instance in cloud environments.</p>
</li>
<li><p><strong>Consistent performance:</strong> No need for load balancing or data replication.</p>
</li>
</ul>
<h3 id="heading-challenges"><strong>Challenges:</strong></h3>
<ul>
<li><p><strong>Hardware limits:</strong> There’s a ceiling to how much you can scale a single machine.</p>
</li>
<li><p><strong>Downtime risks:</strong> Upgrading a machine often requires downtime, impacting availability.</p>
</li>
<li><p><strong>Single point of failure:</strong> The system remains dependent on one machine.</p>
</li>
</ul>
<h3 id="heading-when-to-use-vertical-scaling"><strong>When to Use Vertical Scaling:</strong></h3>
<ul>
<li><p>Applications with <strong>monolithic architectures</strong>.</p>
</li>
<li><p>Systems where downtime for upgrades is acceptable.</p>
</li>
<li><p>When simplicity is a priority and workloads are predictable.</p>
</li>
</ul>
<hr />
<h2 id="heading-what-is-horizontal-scaling">What is Horizontal Scaling?</h2>
<p>Horizontal scaling (also known as <strong>scaling out</strong>) involves adding more machines (nodes) to distribute the load. In cloud environments, this often means deploying more instances of your application.</p>
<h3 id="heading-benefits-of-horizontal-scaling"><strong>Benefits of Horizontal Scaling:</strong></h3>
<ul>
<li><p><strong>Unlimited scaling potential:</strong> You can keep adding nodes as needed.</p>
</li>
<li><p><strong>High availability:</strong> If one node fails, others can continue handling the load.</p>
</li>
<li><p><strong>Resilience:</strong> With proper load balancing, the system can tolerate failures better.</p>
</li>
</ul>
<h3 id="heading-challenges-1"><strong>Challenges:</strong></h3>
<ul>
<li><p><strong>Complexity:</strong> Requires changes to the application architecture to support distributed systems.</p>
</li>
<li><p><strong>Data consistency:</strong> Maintaining data consistency across multiple nodes can be challenging.</p>
</li>
<li><p><strong>Load balancing:</strong> You need effective strategies to distribute traffic across nodes.</p>
</li>
</ul>
<h3 id="heading-when-to-use-horizontal-scaling"><strong>When to Use Horizontal Scaling:</strong></h3>
<ul>
<li><p>Systems that need to handle <strong>large-scale traffic</strong> or have unpredictable workloads.</p>
</li>
<li><p>Applications built using <strong>microservices</strong> or <strong>distributed architectures</strong>.</p>
</li>
<li><p>Scenarios where <strong>high availability</strong> is a requirement.</p>
</li>
</ul>
<hr />
<h2 id="heading-a-comparative-table-horizontal-vs-vertical-scaling">A Comparative Table: Horizontal vs Vertical Scaling</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Vertical Scaling</strong></td><td><strong>Horizontal Scaling</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Cost Efficiency</strong></td><td>Expensive</td><td>More cost-effective at scale</td></tr>
<tr>
<td><strong>Complexity</strong></td><td>Low complexity</td><td>Higher complexity</td></tr>
<tr>
<td><strong>Fault Tolerance</strong></td><td>Low (single point of failure)</td><td>High (redundancy across nodes)</td></tr>
<tr>
<td><strong>Downtime</strong></td><td>Potential downtime during upgrades</td><td>Minimal downtime with new nodes</td></tr>
<tr>
<td><strong>Scaling Limit</strong></td><td>Hardware limitations</td><td>Virtually unlimited</td></tr>
<tr>
<td><strong>Application Changes</strong></td><td>Minimal</td><td>Requires architecture changes</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-key-considerations-when-choosing-a-scaling-strategy">Key Considerations When Choosing a Scaling Strategy</h2>
<ul>
<li><p><strong>Workload Characteristics:</strong><br />  If your application has bursty traffic, horizontal scaling can handle spikes better with load balancing.</p>
</li>
<li><p><strong>Budget Constraints:</strong><br />  Vertical scaling might be suitable for smaller applications where the cost of multiple nodes is prohibitive.</p>
</li>
<li><p><strong>Architecture Design:</strong><br />  Microservices and stateless applications thrive with horizontal scaling, while monolithic apps often require vertical scaling.</p>
</li>
<li><p><strong>Cloud Provider Features:</strong><br />  Cloud platforms like AWS, Azure, and GCP offer auto-scaling groups, making horizontal scaling more accessible.</p>
</li>
</ul>
<hr />
<h2 id="heading-real-world-examples">Real-World Examples</h2>
<ul>
<li><p><strong>Vertical Scaling:</strong><br />  A relational database like <strong>PostgreSQL</strong> on a single server can benefit from vertical scaling by adding more CPU and RAM.</p>
</li>
<li><p><strong>Horizontal Scaling:</strong><br />  Web applications using <strong>Kubernetes</strong> can deploy additional pods to handle increased traffic, making horizontal scaling seamless.</p>
</li>
</ul>
<hr />
<h2 id="heading-conclusion">Conclusion</h2>
<p>Both horizontal and vertical scaling have their place in system design. While vertical scaling offers simplicity and quick upgrades, horizontal scaling provides better fault tolerance and scalability. As a software engineer, understanding your application’s needs and workload patterns is critical to making the right decision.</p>
<p>Do you have insights on scaling strategies? Share them in the comments!</p>
]]></content:encoded></item><item><title><![CDATA[Stateful vs Stateless Applications: Key Differences and Design Considerations]]></title><description><![CDATA[In the world of distributed systems and modern application architecture, understanding whether to design an application as stateful or stateless can significantly impact performance, scalability, and user experience. This blog explores these concepts...]]></description><link>https://anishratnawat.com/stateful-vs-stateless-applications-key-differences-and-design-considerations</link><guid isPermaLink="true">https://anishratnawat.com/stateful-vs-stateless-applications-key-differences-and-design-considerations</guid><category><![CDATA[#StatefulApplications]]></category><category><![CDATA[StateLESS]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Tue, 13 Jul 2021 18:30:00 GMT</pubDate><content:encoded><![CDATA[<p>In the world of distributed systems and modern application architecture, understanding whether to design an application as <strong>stateful</strong> or <strong>stateless</strong> can significantly impact performance, scalability, and user experience. This blog explores these concepts, highlights their pros and cons, and provides design guidelines.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733075272437/d48a200c-daea-4428-ae67-a1e6f5cfc703.png" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-stateful-applications"><strong>Stateful Applications</strong></h2>
<p>A <strong>stateful application</strong> maintains state information between requests. This means the server keeps track of the user’s interactions, storing data like session information, user preferences, and temporary data across multiple client requests.</p>
<h3 id="heading-key-characteristics"><strong>Key Characteristics:</strong></h3>
<ul>
<li><p><strong>Session Management:</strong> State information is maintained on the server (e.g., session IDs, user data).</p>
</li>
<li><p><strong>Resource Dependence:</strong> Requires consistent access to the same server or storage.</p>
</li>
<li><p><strong>Failure Handling:</strong> Complex, as state restoration is necessary after crashes.</p>
</li>
</ul>
<h3 id="heading-examples"><strong>Examples:</strong></h3>
<ul>
<li><p>Banking applications (maintaining session data)</p>
</li>
<li><p>Video conferencing tools (preserving connection state)</p>
</li>
</ul>
<h3 id="heading-pros"><strong>Pros:</strong></h3>
<ul>
<li><p>Simplified user experience since state persistence allows continuity.</p>
</li>
<li><p>Easier to handle complex workflows where intermediate data is needed.</p>
</li>
</ul>
<h3 id="heading-cons"><strong>Cons:</strong></h3>
<ul>
<li><p>Scalability challenges as servers need to retain state.</p>
</li>
<li><p>Complex failure handling, as state recovery is needed after server crashes.</p>
</li>
</ul>
<hr />
<h2 id="heading-stateless-applications"><strong>Stateless Applications</strong></h2>
<p>Stateless applications treat each request independently. No session data is stored on the server, and each request carries all the information needed for processing.</p>
<h3 id="heading-key-characteristics-1"><strong>Key Characteristics:</strong></h3>
<ul>
<li><p><strong>Independent Requests:</strong> Every request is self-contained.</p>
</li>
<li><p><strong>Scalable Architecture:</strong> Easy to scale by adding more servers.</p>
</li>
<li><p><strong>Fault Tolerance:</strong> No state recovery is needed if a server crashes.</p>
</li>
</ul>
<h3 id="heading-use-cases"><strong>Use Cases:</strong></h3>
<ul>
<li><p>RESTful APIs and microservices.</p>
</li>
<li><p>Content delivery networks (CDNs).</p>
</li>
<li><p>Serverless computing platforms.</p>
</li>
</ul>
<h3 id="heading-pros-and-cons"><strong>Pros and Cons:</strong></h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Pros</td><td>Cons</td></tr>
</thead>
<tbody>
<tr>
<td>Easy horizontal scaling.</td><td>Repetitive data transmission.</td></tr>
<tr>
<td>Simplified fault tolerance.</td><td>Complex workflows need extra handling.</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-stateful-vs-stateless-side-by-side-comparison"><strong>Stateful vs Stateless: Side-by-Side Comparison</strong></h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Feature</strong></td><td><strong>Stateful Applications</strong></td><td><strong>Stateless Applications</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>State Management</strong></td><td>Maintains state across sessions.</td><td>No state is maintained.</td></tr>
<tr>
<td><strong>Scalability</strong></td><td>Challenging due to session affinity.</td><td>Easy horizontal scaling.</td></tr>
<tr>
<td><strong>Failure Handling</strong></td><td>Requires state restoration.</td><td>No state recovery needed.</td></tr>
<tr>
<td><strong>Performance</strong></td><td>Can be slower due to overhead.</td><td>Generally faster and simpler.</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-design-considerations"><strong>Design Considerations</strong></h2>
<p>When choosing between stateful and stateless architecture, consider the following factors:</p>
<ol>
<li><p><strong>Scalability Needs:</strong><br /> Stateless systems are ideal for applications that require scaling across multiple servers.</p>
</li>
<li><p><strong>User Experience:</strong><br /> Stateful applications provide smoother, continuous experiences, which are essential for certain workflows.</p>
</li>
<li><p><strong>Failure Handling:</strong><br /> Stateless applications are easier to manage in case of server failures.</p>
</li>
<li><p><strong>Resource Management:</strong><br /> Stateful applications may require more resources to manage sessions and state.</p>
</li>
</ol>
<h3 id="heading-hybrid-approach"><strong>Hybrid Approach:</strong></h3>
<p>Many modern systems adopt a hybrid approach, where critical user interactions are stateful (e.g., login sessions), while the rest of the interactions remain stateless.</p>
<hr />
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Choosing between stateful and stateless architectures depends on your application's requirements. Stateless systems offer simplicity and scalability, making them a popular choice for cloud-native applications. However, stateful systems are essential for delivering rich, interactive user experiences where context matters. Understanding these trade-offs is key to designing robust, scalable, and efficient distributed systems.</p>
<p><strong>Tip for Software Engineers:</strong> For large-scale systems, consider using <strong>stateless microservices</strong> with a <strong>stateful data store</strong> to balance performance and user experience.</p>
<hr />
<p>Are you working on designing stateful or stateless systems? Share your thoughts and experiences in the comments below!</p>
]]></content:encoded></item><item><title><![CDATA[Long-Polling vs WebSockets vs Server-Sent Events]]></title><description><![CDATA[Long-Polling, WebSockets, and Server-Sent Events are popular communication protocols between a client like a web browser and a web server. First, let’s start with understanding what a standard HTTP web request looks like. Following are a sequence of ...]]></description><link>https://anishratnawat.com/long-polling-vs-websockets-vs-server-sent-events</link><guid isPermaLink="true">https://anishratnawat.com/long-polling-vs-websockets-vs-server-sent-events</guid><category><![CDATA[longpolling]]></category><category><![CDATA[websockets]]></category><category><![CDATA[events]]></category><dc:creator><![CDATA[Anish Ratnawat]]></dc:creator><pubDate>Mon, 20 May 2019 18:30:00 GMT</pubDate><content:encoded><![CDATA[<p>Long-Polling, WebSockets, and Server-Sent Events are popular communication protocols between a client like a web browser and a web server. First, let’s start with understanding what a standard HTTP web request looks like. Following are a sequence of events for regular HTTP request:</p>
<ol>
<li><p>Client opens a connection and requests data from the server.</p>
</li>
<li><p>The server calculates the response.</p>
</li>
<li><p>The server sends the response back to the client on the opened request.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733071284900/e44d1145-5504-46b9-b877-7563c54bea0a.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-ajax-polling">Ajax Polling</h3>
<p>Polling is a standard technique used by the vast majority of AJAX applications. The basic idea is that the client repeatedly polls (or requests) a server for data. The client makes a request and waits for the server to respond with data. If no data is available, an empty response is returned.</p>
<ol>
<li><p>Client opens a connection and requests data from the server using regular HTTP.</p>
</li>
<li><p>The requested webpage sends requests to the server at regular intervals (e.g., 0.5 seconds).</p>
</li>
<li><p>The server calculates the response and sends it back, just like regular HTTP traffic.</p>
</li>
<li><p>Client repeats the above three steps periodically to get updates from the server.</p>
</li>
</ol>
<p>Problem with Polling is that the client has to keep asking the server for any new data. As a result, a lot of responses are empty creating HTTP overhead.</p>
<h3 id="heading-http-long-polling">HTTP Long-Polling</h3>
<p>A variation of the traditional polling technique that allows the server to push information to a client, whenever the data is available. With Long-Polling, the client requests information from the server exactly as in normal polling, but with the expectation that the server may not respond immediately. That’s why this technique is sometimes referred to as a “Hanging GET”.</p>
<ul>
<li><p>If the server does not have any data available for the client, instead of sending an empty response, the server holds the request and waits until some data becomes available.</p>
</li>
<li><p>Once the data becomes available, a full response is sent to the client. The client then immediately re-request information from the server so that the server will almost always have an available waiting request that it can use to deliver data in response to an event.</p>
</li>
</ul>
<p>The basic life cycle of an application using HTTP Long-Polling is as follows:</p>
<ol>
<li><p>The client makes an initial request using regular HTTP and then waits for a response.</p>
</li>
<li><p>The server delays its response until an update is available, or until a timeout has occurred.</p>
</li>
<li><p>When an update is available, the server sends a full response to the client.</p>
</li>
<li><p>The client typically sends a new long-poll request, either immediately upon receiving a response or after a pause to allow an acceptable latency period.</p>
</li>
<li><p>Each Long-Poll request has a timeout. The client has to reconnect periodically after the connection is closed, due to timeouts.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733071354969/a7459a05-79fc-4fcb-9bba-f63ae5c9a940.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-websockets"><strong>WebSockets</strong></h3>
<p>WebSocket provides Full duplex communication channels over a single TCP connection. It provides a persistent connection between a client and a server that both parties can use to start sending data at any time. The client establishes a WebSocket connection through a process known as the WebSocket handshake. If the process succeeds, then the server and client can exchange data in both directions at any time. The WebSocket protocol enables communication between a client and a server with lower overheads, facilitating real-time data transfer from and to the server. This is made possible by providing a standardized way for the server to send content to the browser without being asked by the client, and allowing for messages to be passed back and forth while keeping the connection open. In this way, a two-way (bi-directional) ongoing conversation can take place between a client and a server.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733071398433/22c203c5-24a1-4ba0-9415-cf6df2249535.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-server-sent-events-sses">Server-Sent Events (SSEs)</h3>
<p>Under SSEs the client establishes a persistent and long-term connection with the server. The server uses this connection to send data to a client. If the client wants to send data to the server, it would require the use of another technology/protocol to do so.</p>
<ol>
<li><p>Client requests data from a server using regular HTTP.</p>
</li>
<li><p>The requested webpage opens a connection to the server.</p>
</li>
<li><p>The server sends the data to the client whenever there’s new information available.</p>
</li>
</ol>
<p>SSEs are best when we need real-time traffic from the server to the client or if the server is generating data in a loop and will be sending multiple events to the client.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1733071434110/89df28e6-10c9-4703-9fd0-dfd1fe5bfc47.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-conclusion">Conclusion</h3>
<p>Long-Polling, WebSockets, and Server-Sent Events each offer unique advantages and are suited to different use cases in client-server communication.</p>
<p>Long-Polling is a more efficient version of traditional polling, reducing unnecessary server requests by waiting for data to become available.</p>
<p>WebSockets provide a full-duplex communication channel, allowing for real-time, bidirectional data exchange with minimal overhead, making them ideal for applications requiring constant interaction, such as chat applications or live updates.</p>
<p>Server-Sent Events are optimal for scenarios where the server needs to push updates to the client, such as live news feeds or stock price updates, but do not require client-to-server communication.</p>
<p>Understanding the strengths and limitations of each protocol can help developers choose the most appropriate solution for their specific application needs.</p>
]]></content:encoded></item></channel></rss>