awesome-everything RU
↑ Back to the climb

Networking & Protocols

Routing and forwarding

Crux How routers decide where a packet goes next: longest-prefix match in the FIB, CIDR aggregation, BGP between autonomous systems, and ECMP across equal-cost paths.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 14 min

When a packet arrives at a router, the router has microseconds to decide where to send it. It consults a table holding millions of destination prefixes, finds the best match, and transmits the packet — all at line rate. Understanding this mechanism explains why a single bad BGP announcement can reroute global traffic, and why ECMP can cause elephant flows.

CIDR and longest-prefix match

IPv4 addresses were once divided into rigid classes (A: /8, B: /16, C: /24). Classless Inter-Domain Routing (CIDR, RFC 4632) replaced classes with variable-length prefixes: any prefix from /1 to /32 is valid.

A CIDR prefix like 10.0.0.0/24 means “the first 24 bits identify the network; the last 8 bits identify the host.” That prefix covers 256 addresses.

PrefixSizeUsable hosts
/24256 addresses254 (subtract .0 network, .255 broadcast)
/1665,536 addresses65,534
/304 addresses2 (for point-to-point links)
/816,777,216 addresses16,777,214

When multiple prefixes match a destination, the router picks the longest match (the most specific prefix). A packet to 10.0.0.5 matches both 10.0.0.0/8 and 10.0.0.0/24; the /24 wins because it is more specific.

CIDR enables route aggregation: a single announcement of 10.0.0.0/8 covers 16 million addresses. Without aggregation, the BGP routing table would exceed practical memory limits.

Routing table vs forwarding table (FIB)

Routing table: built by routing protocols (BGP, OSPF, IS-IS). It stores all known prefixes, their attributes, and alternatives. May have millions of entries.

Forwarding Information Base (FIB): the data-plane table the router uses for every packet. It is a trimmed-down per-destination next-hop view compiled from the routing table. Implemented in TCAM hardware on modern routers for line-rate lookups.

The split enables a clean architecture:

  • Control plane (software): receives routing-protocol updates, computes best paths, writes the routing table, syncs the FIB.
  • Data plane (hardware ASIC): uses the FIB for every packet at terabit-per-second speed.

Routing table updates flow from BGP/OSPF → routing table → FIB in milliseconds. Bugs in the data plane (rare) manifest as silent packet loss rather than obvious crashes.

ARP and Neighbor Discovery

Once a router knows the next-hop IP it must learn that next-hop’s MAC address to build the link-layer frame. ARP (IPv4) broadcasts “who has 10.0.0.5?” on the local subnet; the owner replies with its MAC. The router caches this.

Neighbor Discovery (ND) is IPv6’s equivalent — it uses ICMPv6 multicast instead of broadcast, and also handles router advertisements, address autoconfiguration, and duplicate address detection.

Routing scale
Global BGP table size (2026)
~1,000,000 prefixes
ECMP typical path count
4–64 equal-cost paths
ARP cache timeout (Linux)
60 s
BGP convergence time (tuned)
1–5 s
OSPF convergence (datacenter)
~200 ms
TCAM lookup (hardware FIB)
nanoseconds

BGP: the Internet’s routing protocol

The Internet is divided into Autonomous Systems (AS) — networks operated by one organisation, each with a unique AS number. Within an AS, interior gateway protocols (OSPF, IS-IS) compute routes. Between ASes, BGP (RFC 4271) negotiates which prefixes each AS announces and prefers.

BGP is a path-vector protocol: each route carries the list of AS numbers it has traversed. A router prefers a shorter AS-path (fewer ASes to traverse), with many tie-breakers (local preference, MED, community attributes). BGP is deliberately policy-driven: an AS can prefer expensive direct peering over cheap transit, or avoid certain ASes entirely.

Route leaks and hijacks. A misconfigured or malicious AS can announce a prefix it does not own. Traffic destined for the legitimate owner reroutes to the attacker. RPKI (Resource Public Key Infrastructure) lets prefix owners sign their announcements so receivers can validate authenticity (more on this in the security lesson).

ECMP: splitting traffic across equal-cost paths

When a router has multiple equal-cost paths to a destination, it uses ECMP (Equal-Cost Multi-Path) to distribute load. The router hashes the 5-tuple (source IP, source port, destination IP, destination port, protocol) and picks a path by hash modulo path-count. All packets of one TCP flow hash to the same path (preserving ordering within the flow); different flows distribute across paths.

Datacentre Clos (leaf-spine) fabrics rely on ECMP for bisection bandwidth. Pitfalls:

  • Elephant flows: one large flow saturates one path while others sit idle.
  • Hash polarisation: the same hash function applied at consecutive hops causes systematic imbalance.
  • Modern designs use weighted ECMP, flowlet switching (splitting a flow during quiet moments), or CONGA for load-aware rebalancing.
Why this works

Why BGP allows asymmetric routing. Each AS independently selects its outbound path per destination. The path a packet takes outbound may differ from the path a response takes inbound — the two sides of a conversation may traverse completely different ASes. This is legal and common (called asymmetric routing). It matters for latency (outbound: 60 ms, return: 150 ms because of a detour), for traceroute interpretation (trace in each direction separately), and for stateful firewalls (which can see only one direction of traffic).

Trace it
1/5

Trace a packet from a home PC in London to a server in San Francisco.

1
Step 1 of 5
Packet leaves PC. First-hop router?
2
Locked
Packet enters the ISP. What does the ISP edge router do?
3
Locked
Packet traverses ~10–15 hops across multiple ASes. What does each router do?
4
Locked
A multi-AS segment uses MPLS. What changes?
5
Locked
Packet arrives at the destination network. What happens?
Quiz

What does a /24 CIDR prefix mean?

Quiz

Why does a router maintain both a routing table and a forwarding table (FIB)?

Order the steps

Order route resolution at a router when a packet arrives:

  1. 1 Decrement TTL; drop and send ICMP if TTL is 0
  2. 2 Look up destination IP in FIB (longest-prefix match)
  3. 3 Identify the next-hop IP and outbound interface
  4. 4 ARP/ND resolve next-hop MAC if not cached
  5. 5 Build new L2 frame with destination MAC
  6. 6 Transmit frame on output interface
Recall before you leave
  1. 01
    What is longest-prefix match, and why is it needed?
  2. 02
    Explain why a single misbehaving BGP announcement can take a domain offline globally.
  3. 03
    Why can ECMP cause elephant-flow problems and what do modern designs do about it?
Recap

Routing builds a global map of reachable prefixes; forwarding uses it at line rate. CIDR variable-length prefixes enable aggregation — a /8 announcement covers 16 million addresses and keeps the BGP table manageable. Routers split control plane (routing table, BGP/OSPF updates) from data plane (FIB in TCAM hardware, nanosecond lookups). BGP propagates prefix reachability between Autonomous Systems via path-vector with AS-path; it is policy-driven and allows asymmetric routing by design. ECMP distributes flows across equal-cost paths by 5-tuple hash — great for aggregate throughput, problematic for elephant flows.

Connected lessons
Continue the climb ↑MTU and fragmentation
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.