← Back to Logs

How BGP Actually Works: The Routing Protocol Holding the Internet Together

Try the interactive lab for this articleTake the quiz (6 questions · ~5 min)

The internet is not one network. It is a negotiated truce between thousands of independently operated networks that often have different commercial incentives, different security postures, and different ideas about what traffic they want to carry. BGP, the Border Gateway Protocol, is the system that makes this loose federation look like a single connected whole.

People like to describe BGP as "the internet's routing protocol," which is true but not especially informative. What makes BGP interesting is that it is not trying to compute the mathematically shortest path. It is trying to compute an acceptable path under policy. A path can be longer, slower, or more expensive in pure technical terms and still win because it satisfies business relationships and local operator preferences. That is what makes BGP problems so weird. When something breaks, it is rarely because the graph algorithm was wrong. It is because a human being somewhere announced, filtered, preferred, or redistributed the wrong thing.

To understand why BGP is both indispensable and fragile, we need to look at how interdomain routing actually works.

1. Autonomous Systems: The Internet's Real Administrative Units

An Autonomous System, or AS, is a network under a single routing policy. Usually this means one organisation, one administrative domain, one set of BGP decisions.

Examples include:

  • an ISP in Athens
  • a hyperscale cloud provider
  • a mobile carrier operating across Europe
  • a content delivery network
  • a large enterprise connected to multiple upstream providers

Each AS is identified by an Autonomous System Number, or ASN. Routers speaking BGP use ASNs to express routing paths and policy relationships.

Inside an AS, operators are free to use whatever interior routing system they like: OSPF, IS-IS, EIGRP, static routing, Segment Routing, MPLS, EVPN, or something custom. BGP does not care. BGP operates between ASes and answers one question:

Which neighbouring AS should I hand this traffic to if I want to reach prefix X?

That prefix is usually an IP block such as 203.0.113.0/24 or 2001:db8:1234::/48.

This distinction matters. BGP is interdomain routing, not intradomain routing. It decides which external path wins, not how packets move inside a particular provider's backbone.

2. BGP Is a Path-Vector Protocol, Not a Link-State Protocol

If you have worked with OSPF or IS-IS, your instinct is probably to think in terms of topology maps and shortest path trees. BGP is different.

Link-State Thinking

In a link-state protocol, routers flood topology information. Everyone builds roughly the same graph, then runs Dijkstra's algorithm to compute shortest paths.

Distance-Vector Thinking

In a classic distance-vector protocol, routers tell neighbours "I can reach destination X with cost Y," and neighbours iterate until the network converges.

Path-Vector Thinking

BGP adds one crucial piece: the full AS path.

Instead of saying only:

I can reach 203.0.113.0/24 with cost 5

BGP says:

I can reach 203.0.113.0/24 via AS path 64510 64496 64497

That AS_PATH attribute gives BGP loop prevention. If a router sees its own ASN already in the path, it rejects the route. No need for a global topology map.

This makes BGP scalable enough for the global internet, but it also means path selection is shaped by path attributes and local policy, not by any universal notion of optimality.

3. BGP Neighbours and Sessions: The Plumbing First

BGP routers, usually called peers or neighbours, form long-lived TCP sessions on port 179. This is already a clue about BGP's design philosophy. The protocol offloads reliability, ordering, and retransmission to TCP and concentrates on route exchange and policy.

The session establishment flow looks like this:

  1. TCP connection comes up.
  2. Both sides exchange BGP OPEN messages.
  3. Parameters such as ASN, BGP version, hold timer, and capabilities are negotiated.
  4. Periodic KEEPALIVE messages maintain the session.
  5. Routes are exchanged in UPDATE messages.
  6. Errors trigger NOTIFICATION, after which the session is normally closed.

The four fundamental BGP message types are:

Type Purpose
OPEN Start session and negotiate capabilities
UPDATE Announce or withdraw routes
KEEPALIVE Confirm session liveness
NOTIFICATION Signal protocol error and terminate

Modern BGP capabilities negotiated in the OPEN message often include:

  • 4-byte ASNs
  • Multiprotocol extensions for IPv6 and VPN families
  • Route Refresh
  • Graceful Restart
  • Add-Path in some deployments

Without capabilities, BGP would be stuck with the assumptions of the 1990s.

4. What a Route Advertisement Actually Contains

The UPDATE message is the core of BGP. It can announce reachable prefixes, withdraw old ones, and attach path attributes that influence selection.

A simplified route advertisement might look like this:

NLRI: 203.0.113.0/24
AS_PATH: 64520 64510
NEXT_HOP: 198.51.100.14
LOCAL_PREF: 200
MED: 50
COMMUNITY: 64520:100
ORIGIN: IGP

The most important attributes are:

AS_PATH

The list of ASNs that the route has traversed. It is used for loop prevention and often as a rough preference signal. Shorter paths are often preferred, but only after more important local policy attributes.

NEXT_HOP

The IP address to which traffic should be forwarded if this route is chosen. This is usually a directly reachable peer address or a route reflector next hop in more complex topologies.

LOCAL_PREF

A non-transitive attribute used inside one AS to rank exit points. Higher is preferred. This is one of the most important policy levers because operators can say, for example:

  • prefer customer-learned routes over peer-learned routes
  • prefer one transit provider over another
  • steer outbound traffic to a cheaper exchange point

MED

The Multi-Exit Discriminator suggests which entry point another AS should prefer when multiple interconnects exist. Lower is preferred, but only under limited conditions and often only when comparing routes from the same neighbouring AS.

ORIGIN

A historical marker indicating whether the route originated from an interior protocol, EGP, or was incomplete. In practice this attribute is far less important than others.

COMMUNITIES

One of the most operationally useful features in BGP. Communities are tags attached to routes that upstreams and internal policy engines can act on.

A community can mean things like:

  • do not advertise this route to peers
  • prepend my ASN twice when sending to transit provider X
  • lower local preference in region Y
  • treat this as blackhole traffic during DDoS mitigation

Communities are not standardised in meaning across the internet, but they are standardised in format. Their meaning is defined by the operator that interprets them.

5. How BGP Chooses a Best Path

BGP often learns multiple routes to the same prefix. It must choose exactly one best path for installation into the routing table, unless advanced features like multipath are enabled.

The exact decision process varies a bit by vendor, but a typical order is:

  1. Highest LOCAL_PREF
  2. Prefer locally originated routes
  3. Shortest AS_PATH
  4. Lowest ORIGIN type
  5. Lowest MED
  6. Prefer eBGP over iBGP
  7. Lowest interior cost to the next hop
  8. Oldest path, lowest router ID, or similar tie-breakers

That ordering tells you what BGP really is: a policy engine with routing attached.

If an operator sets:

route-map PREFER-CHEAP-TRANSIT
  set local-preference 300

then a longer path through a cheaper provider may beat a shorter path through a more expensive one. This is normal.

A Concrete Example

Imagine AS 64500 in Berlin receives two routes to 203.0.113.0/24:

Route A:
  AS_PATH: 64510 64496
  LOCAL_PREF: 100
 
Route B:
  AS_PATH: 64520 64530 64496
  LOCAL_PREF: 200

Even though Route B has a longer AS path, it wins because LOCAL_PREF is evaluated first.

This surprises people who expect shortest path routing. BGP is not trying to minimise hops. It is trying to satisfy operator intent.

6. eBGP vs iBGP: External Exchange and Internal Distribution

BGP comes in two operational flavours:

  • eBGP, External BGP, between different ASes
  • iBGP, Internal BGP, within the same AS

eBGP

This is the easy part conceptually. Two different ASes connect and exchange routes. When AS 64500 peers with AS 64510 at an internet exchange in Amsterdam, that is eBGP.

iBGP

Inside a large network, not every router peers directly with the outside world. Routes learned at edge routers need to be distributed internally. That is where iBGP comes in.

There is a famous complication: routes learned from one iBGP peer are not, by default, advertised to another iBGP peer. This prevents loops but creates a scaling problem. A full mesh of iBGP sessions would require:

n * (n - 1) / 2

sessions.

For 100 routers, that is 4,950 sessions. For 1000 routers, it is 499,500. Clearly not ideal.

Two major solutions exist:

Route Reflectors

Certain routers act as central redistribution points. Clients peer with the reflector instead of with every other client. This reduces session count dramatically and is the dominant design in large networks.

Confederations

A large AS is split into sub-ASes internally, each running eBGP-like behaviour, while still presenting as one public AS externally. This is less common than route reflection but still used in some networks.

7. Peering, Transit, and Why Business Relationships Shape the Internet

BGP cannot be understood purely as a protocol. Commercial relationships determine what routes are even advertised in the first place.

There are three common relationships:

Customer to Provider

The customer pays the provider for internet connectivity. The provider usually advertises the full table or a large default view to the customer, and advertises the customer's prefixes onward to the rest of the world.

Peer to Peer

Two networks of roughly similar value exchange traffic for mutual benefit, usually only for each other's customers, not for the entire internet. This is "settlement-free peering" in many cases.

Provider to Customer

The inverse of the first relation from the provider's point of view.

These relationships create routing policies such as valley-free routing. In simplified form:

  • traffic can go uphill from customer to provider
  • cross between peers
  • then downhill from provider to customer

What it should not do is go uphill, then downhill, then uphill again using one network as unpaid transit between two others.

This is why route export policy matters so much. A peer-learned route is typically not exported to another peer or provider. Customer routes are more freely exported because carrying them is paid business.

BGP therefore encodes economics just as much as connectivity.

8. Route Leaks and Hijacks: What Happens When the Trust Model Fails

BGP's original trust model is brutally simple: if your neighbour says they can reach a prefix, and your policy allows it, you may believe them.

That worked tolerably when the internet was smaller and more collegial. At global scale, it is a constant source of operational incidents.

Prefix Hijack

A network originates a prefix it does not actually own. If other networks accept the announcement and prefer it, traffic for that prefix is diverted.

Example:

Legitimate:
  AS 64496 originates 203.0.113.0/24
 
Hijack:
  AS 64566 also originates 203.0.113.0/24

If enough of the internet prefers the false route, traffic will flow to AS 64566. That traffic might be blackholed, inspected, or forwarded onward after interception.

More-Specific Hijack

This is even more effective. IP routing prefers the longest matching prefix.

If the legitimate owner advertises:

203.0.113.0/23

and the hijacker advertises:

203.0.113.0/24
203.0.114.0/24

many routers will choose the more specific advertisements, even if the aggregate route is still present.

Route Leak

A network accidentally or improperly re-advertises routes learned from one relationship into another where they should not go. This is not necessarily a malicious hijack. It is often a policy failure.

For example, a customer receives a full table from provider A and then accidentally advertises that full table to provider B, effectively claiming to be a transit path for huge portions of the internet. Large outages have happened exactly this way.

9. Securing BGP: Partial Fixes for a Protocol That Shipped Without Them

Because BGP was not designed with strong authentication of route origin or path integrity, several protective layers have been added over time.

Prefix Filtering

The most basic defence. Upstreams maintain lists of which prefixes each customer is allowed to announce and reject everything else.

This sounds trivial. It is also one of the most effective controls in practice.

IRR-Based Filtering

Operators use Internet Routing Registry objects to build prefix filters automatically. The quality of IRR data varies, and stale records are common, but it is better than no validation at all.

RPKI and Route Origin Validation

Resource Public Key Infrastructure lets the holder of IP address space create Route Origin Authorisations, or ROAs, saying which ASN is allowed to originate a prefix.

A router validating a BGP announcement can classify it as:

  • Valid
  • Invalid
  • Not Found

If AS 64496 has a ROA authorising only itself to originate 203.0.113.0/24, then an origin by AS 64566 can be marked invalid.

This is a major improvement, but it only validates the origin AS, not the entire AS path.

BGPsec

BGPsec attempts to cryptographically protect the AS path itself, but deployment complexity and operational cost have limited adoption. At internet scale, security solutions that are technically sound but operationally painful tend to struggle.

TTL Security and Session Authentication

For direct peer session protection, operators may use:

  • MD5 or TCP-AO authentication on BGP sessions
  • TTL Security Hack (GTSM) to reject packets that did not originate one hop away

These protect the session, not the route semantics.

10. Convergence: Why BGP Is Slow by Design

When an interior routing protocol changes, convergence is often measured in sub-seconds or a few seconds. BGP is different.

Global BGP convergence is intentionally conservative because instability is dangerous. If routers reacted instantly to every flap with unrestricted re-advertisement, the control plane would thrash.

Important stabilising mechanisms include:

MRAI

Minimum Route Advertisement Interval limits how frequently updates for a destination are sent.

Route Flap Damping

Historically used to penalise unstable routes, though it was often too aggressive and is less favoured today on the wider internet.

Graceful Restart

Allows temporary control-plane restart without immediate traffic loss, assuming forwarding can continue while routing state is rebuilt.

The result is that BGP failure handling is often measured in seconds to minutes depending on topology, policy, and failure mode. That is acceptable for many interdomain situations because the alternative, global instability, is worse.

11. Routing Tables, FIB Pressure, and the Cost of Internet Growth

The global BGP table is not static, and it is not small.

Every additional prefix that must be carried by the routing system increases pressure on:

  • control-plane memory
  • route processing CPU
  • router convergence time
  • forwarding table capacity in hardware

This matters because routers do not just store routes in one place.

Adj-RIB-In, Loc-RIB, and FIB

Conceptually, a BGP-speaking router deals with several stages of routing information:

  • Adj-RIB-In: routes learned from neighbours before local policy decides what to do
  • Loc-RIB: the router's selected best paths after policy
  • Adj-RIB-Out: routes prepared for advertisement to a given neighbour
  • FIB: the actual forwarding entries installed in hardware or kernel forwarding tables

This separation is why BGP is not just "a list of prefixes." A single prefix may exist in multiple policy contexts before one forwarding decision wins.

TCAM Is Not Infinite

High-performance routers often use TCAM, Ternary Content-Addressable Memory, to store forwarding entries for very fast lookup. TCAM is expensive, power-hungry, and finite. If you grow the internet table fast enough, old hardware eventually runs out of forwarding capacity long before it runs out of theoretical route-processing capability.

This is one reason route aggregation matters. If everyone announced only clean aggregates, the table would grow more slowly. In practice, deaggregation for traffic engineering, DDoS response, multihoming, and operational exceptions causes the table to expand beyond the tidy ideal.

12. Aggregation and Deaggregation: The Constant Tension

IP routing works best when address space is aggregated cleanly.

If a provider owns:

203.0.113.0/22

it is nicer for the global table if it announces:

203.0.113.0/22

instead of four separate /24s.

Aggregation reduces:

  • global route count
  • churn
  • memory pressure
  • convergence cost

Why Operators Still Deaggregate

Real networks often break aggregates into more specific routes for practical reasons:

  • to influence inbound traffic engineering
  • to send different parts of an address block to different upstreams
  • to mitigate DDoS attacks with targeted blackholing
  • to isolate operational issues

For example, a multihomed enterprise in Paris might advertise its aggregate to both providers, but advertise one more specific prefix only to provider A to attract traffic for one service stack over that provider.

This works because longest-prefix-match wins in the data plane.

The result is a classic tragedy of the commons. Deaggregation is often locally rational and globally expensive.

13. Inbound vs Outbound Traffic Engineering

BGP gives operators a lot of control over outbound traffic and much weaker control over inbound traffic.

Outbound Traffic Engineering

This is the easy direction because it is entirely local. If your AS in Amsterdam learns two routes to a destination, you decide which one to prefer using:

  • local preference
  • hot potato versus cold potato routing strategy
  • interior cost to egress
  • policy communities

You own that decision.

Inbound Traffic Engineering

This is harder because you are trying to influence someone else's route selection process.

Common tools include:

  • AS path prepending
  • announcing more specific prefixes
  • provider-specific BGP communities
  • selective advertisement

AS path prepending means artificially lengthening your own AS path:

64500 64500 64500

in the hope that remote networks will prefer an alternative shorter path.

This is useful, but far from deterministic. If a remote network uses strong local preference rules, your prepending may have no effect at all.

Experienced operators treat inbound traffic engineering as persuasion, not control.

14. Route Reflectors Solve One Problem and Create Others

Route reflectors are essential for scaling iBGP, but they are not free.

Because a route reflector does not behave like a full-mesh iBGP topology, best-path visibility can differ across the network. Different clients may not see the same candidate paths at the same time. This can create path hiding.

Path Hiding

Suppose a reflector selects one best path for a prefix and only reflects that path to clients. A client may have preferred a different candidate if it had seen it, perhaps due to lower interior cost to the next hop. But it never gets the chance because the reflector suppressed the non-best alternatives.

This can lead to:

  • suboptimal exit selection
  • slower failover
  • uneven traffic distribution

Operators work around this with:

  • careful reflector placement
  • additional-path features
  • multiple reflector clusters
  • policy design that minimises pathological visibility gaps

As with so much of BGP, the protocol scales through compromise rather than perfect elegance.

15. Internet Exchanges and Private Interconnects

Much of the internet's interconnection happens at Internet Exchange Points, or IXPs. These are shared switching fabrics where many ASes can peer in one location rather than building individual private circuits to everyone.

An IXP in Frankfurt, Amsterdam, London, or Marseille may host hundreds of participants:

  • access ISPs
  • content networks
  • transit providers
  • clouds
  • mobile operators

This creates several peering models.

Bilateral Peering

Each pair of networks negotiates and configures a direct BGP session separately. This gives maximum control and explicit policy, but it becomes operationally heavy at large exchange fabrics.

Route Servers

Many IXPs run route servers. Participants can peer with the route server and receive routes from many other participants over that one logical relationship, while the actual forwarding still happens directly between the participant routers across the exchange fabric.

This is operationally efficient, but it requires careful filtering and policy transparency from the route server operator.

Private Interconnects

When traffic volume between two networks becomes large enough, they may bypass the shared exchange and build a private interconnect, often a direct cross-connect in a datacentre. This gives more predictable capacity and simpler bilateral control.

These choices, public exchange versus private interconnect, affect both economics and technical performance.

16. A Concrete Multihoming Example

Consider a SaaS provider in Berlin with ASN 64550 connected to two upstreams:

  • Provider A at DE-CIX Frankfurt
  • Provider B at AMS-IX Amsterdam

The company owns 198.51.100.0/23 and wants:

  • Frankfurt preferred for most outbound traffic
  • Amsterdam as backup
  • incoming traffic for one latency-sensitive service biased toward Frankfurt
  • some resilience if Frankfurt has problems

An operator might implement something like:

policy:
  local-pref(provider-a) = 200
  local-pref(provider-b) = 100

for outbound preference.

Then for inbound influence:

  • advertise the aggregate /23 to both providers
  • advertise 198.51.100.0/24 only to provider A
  • advertise 198.51.101.0/24 to both

Now remote networks that accept more specific routes may send traffic for 198.51.100.0/24 preferentially toward Frankfurt, while the aggregate still preserves reachability.

This is exactly the sort of thing that makes BGP powerful and messy. It is not shortest path routing. It is policy choreography.

17. Why BGP Troubleshooting Is Often Counterintuitive

When users report that one service is slow only from one ISP in Lisbon, the root cause may involve:

  • local preference inside a transit provider
  • a withdrawn route at one exchange
  • a stale route on one reflector
  • a more specific prefix being originated only in one region
  • RPKI invalid rejection in one part of the path
  • a next-hop reachability problem inside one AS

BGP troubleshooting requires looking at several distinct questions:

  1. Was the route learned?
  2. Was it accepted by policy?
  3. Did it win best-path selection?
  4. Was the next hop reachable?
  5. Was it actually exported to neighbours?
  6. Did remote networks choose it?

A BGP problem can exist even when the local router "has the route," because having a route is not the same as advertising it, preferring it, or forwarding traffic through it successfully.

18. Multiprotocol BGP: The Same Engine, More Than IPv4 Unicast

Although many engineers first encounter BGP through IPv4 internet routes, modern BGP is really a general path-distribution framework for multiple address families.

With Multiprotocol Extensions, MP-BGP can carry:

  • IPv6 unicast
  • VPNv4 and VPNv6 routes for MPLS L3VPNs
  • EVPN route types for data-centre and campus fabrics
  • multicast-related families in some environments

This matters because BGP's role expanded far beyond "the internet table."

VPNv4 and RD/RT Concepts

In MPLS VPN environments, multiple customers may use overlapping private address space such as 10.0.0.0/8. A plain IPv4 prefix is therefore not enough to distinguish customer routes. MP-BGP solves this with Route Distinguishers and Route Targets.

  • the Route Distinguisher makes prefixes unique in the control plane
  • the Route Target controls which VPN routes are imported into which VRFs

That means BGP is not just carrying reachability. It is carrying tenancy and policy metadata.

EVPN

In modern data-centre fabrics, EVPN uses BGP to distribute MAC reachability, IP-to-MAC bindings, inclusive multicast state, and related overlay information. That is a long way from the original picture of BGP as merely an inter-provider internet protocol.

The reason this works is that BGP, for all its flaws, is extremely good at:

  • incremental update propagation
  • policy attachment
  • scaling with route reflectors
  • expressing reachability in a structured attribute model

Once operators had that machinery, they reused it in many domains.

19. Failure Scenarios: What Actually Breaks First

BGP outages are often described too vaguely. In practice there are several recurring failure classes:

Session Failure

The TCP session to a neighbour drops because of:

  • physical link loss
  • ACL or firewall change
  • bad TTL security settings
  • MD5 mismatch
  • control-plane CPU overload

This is the cleanest failure mode. The neighbour is simply gone, routes are withdrawn, and alternative paths may take over.

Policy Failure

The session stays up, but a route-map, prefix-list, or community policy change causes routes to be:

  • unexpectedly denied
  • exported too widely
  • assigned the wrong local preference
  • tagged with a harmful community

These failures are harder because the BGP adjacency looks healthy while traffic behaviour is wrong.

Next-Hop Failure

A route is selected in BGP, but the next hop becomes unreachable in the IGP or underlay. Now the router may have a "good" BGP path that is not actually usable for forwarding. This is why BGP and interior reachability are inseparable in real operations.

Partial Visibility Failure

Route reflector path hiding, inconsistent policy deployment, or staged configuration rollouts can produce a state where different parts of the same AS see different realities. These are some of the most difficult problems to diagnose because no single router's view tells the whole story.

20. Anycast, CDNs, and Why BGP Affects User Experience Directly

BGP can feel abstract until you realise how many user-facing systems depend on it for performance.

Anycast works by advertising the same prefix from many locations. A resolver IP like 1.1.1.1, a CDN edge prefix, or a DDoS mitigation scrubbing centre may be announced from dozens or hundreds of sites. BGP then causes each user to reach a "nearby" instance according to routing policy.

Nearby here does not always mean geographically nearest. It means nearest according to the decisions of the internet's policy graph. A user in Lisbon may reach Madrid, Paris, or Amsterdam depending on peering and transit arrangements, not just kilometres on a map.

CDN troubleshooting often becomes BGP troubleshooting:

  • why did traffic suddenly start entering via Frankfurt instead of Milan?
  • why is packet loss higher only for one transit path?
  • why are users of one mobile carrier seeing 30 ms more latency?

The answer is often route selection, not application code.

21. The Real Security Model: Operational Discipline

It is tempting to ask why the internet still tolerates such a fragile trust model. The answer is partly historical and partly practical.

For decades, BGP's real security system has been layered operational discipline:

  • prefix filters
  • peer review of routing policy
  • route collectors and looking glasses
  • IRR hygiene
  • max-prefix limits
  • monitoring for sudden origin changes
  • alerting on unexpected path changes
  • gradually increasing RPKI enforcement

That is not cryptographic purity. It is operational defence-in-depth.

BGP stays upright not because the protocol is inherently trustworthy, but because network operators spend enormous effort compensating for its original assumptions.

22. Why BGP Still Exists in This Form

A fair question is why the industry has not simply replaced BGP with something cleaner and more secure.

The answer is that any replacement would need to preserve several properties simultaneously:

  • decentralised policy control
  • incremental deployability
  • interoperation across thousands of organisations
  • acceptable scaling for global route volume
  • support for business relationships, not just technical metrics

That is a brutal design constraint set.

A theoretically beautiful routing protocol that required coordinated global migration or centralised trust would fail politically even if it succeeded technically. BGP survives because it fits the institutional structure of the internet. It lets local actors make local decisions and only loosely coordinates the result.

That does not make it elegant. It makes it deployable.

23. What Operators Optimise Day to Day

In theory, BGP discussions drift toward security and architecture. In day-to-day operations, engineers are usually optimising a smaller set of practical outcomes:

  • keep sessions stable
  • keep route policy comprehensible
  • prevent accidental transit
  • minimise avoidable churn
  • steer traffic toward economical and performant exits
  • keep enough visibility to debug incidents quickly

That operational lens matters because BGP is often judged as if it were only an academic routing algorithm. It is not. It is a production control plane used by people who have to balance cost, safety, latency, and maintainability at the same time.

The best BGP designs are usually not the most clever. They are the ones whose policies can still be understood during an outage at 03:00 by the engineer who did not write the original configuration.

Simplicity Is a Feature

This is why experienced operators are suspicious of overly elaborate policy trees. If a route policy requires pages of exceptions, too many hidden communities, and heroic tribal knowledge, it will eventually fail under pressure. BGP rewards discipline more than cleverness.

The protocol is unforgiving in exactly this way. A simple export rule that everyone on the team understands is often safer than a "perfect" policy that only one architect can mentally simulate.

24. Failure Detection and External Visibility

BGP on its own is not especially fast at discovering that a path died. Traditional liveness depends on keepalives and hold timers, and those timers are often conservative because false positives are expensive.

Real networks add other mechanisms.

BFD

Bidirectional Forwarding Detection, BFD, is commonly paired with BGP to detect failures much faster than BGP keepalives would. A private interconnect in Frankfurt carrying a lot of customer traffic might use BFD so a dead next hop is detected in tens of milliseconds rather than after a long hold timer expires.

This is useful, but not free. Aggressive BFD timers across many sessions increase control-plane load. Operators therefore use it where fast failover matters most rather than everywhere indiscriminately.

Graceful Restart

Graceful Restart solves a different problem. Instead of handling a real path failure, it handles a control-plane restart where forwarding may still be intact. Neighbours temporarily preserve forwarding state while the restarting router rebuilds BGP information. That can turn a route processor restart from a visible outage into a survivable event.

Looking Glasses and Route Collectors

BGP is especially hard because no single router sees the whole picture. Operators therefore rely on:

  • looking glass servers
  • RIPE RIS
  • RouteViews
  • exchange route-server views

These external observation points answer questions like:

  • is my prefix visible globally?
  • which ASN is originating it?
  • is there a more-specific hijack?
  • does one geography see a different AS path?

Without external visibility, BGP operations become guesswork. Intent is not enough. You need evidence of what the rest of the internet actually sees.

25. Communities, Route Servers, and Policy at Scale

BGP communities are one of the most practical tools in the protocol. They let operators attach metadata to routes so downstream policy can act on it.

Typical uses include:

  • lowering local preference in one region
  • suppressing export to certain peers
  • requesting AS path prepends toward selected upstreams
  • triggering blackholing during DDoS mitigation

This matters because large networks do not manage policy one prefix at a time by hand. They manage it through structured tags and predictable policy interpretation. Large Communities improved this further by making the tagging model easier to use in a 4-byte ASN world.

Route Servers at Exchanges

Internet exchanges often run route servers so participants can exchange routing information without building a separate bilateral session with every other participant. That is operationally efficient, but it also means policy hygiene matters a lot. If route-server policy and filtering are sloppy, mistakes can spread quickly across a very large peering fabric.

Safety Controls

This is why defensive controls such as these matter so much:

  • max-prefix limits
  • strict import filters
  • route-server validation
  • alarms on sudden route-count changes

The protocol itself will happily carry a bad decision a very long way if operators do not stop it first.

26. Hot Potato, Cold Potato, and the Economics of Exit

One of the most practical BGP decisions inside a large network is where to hand traffic off.

With hot potato routing, a network exits traffic as soon as practical, using the nearest egress and minimizing its own backbone cost. With cold potato routing, the network keeps traffic on its own backbone longer and hands it off closer to the destination, usually because it believes its internal path is better or because the economics justify it.

This is one reason AS paths can look odd from the outside. The path that seems geographically longer may still be the preferred one because BGP is expressing operator policy, not human intuition about maps.

27. Incident Response in Real Networks

When BGP goes wrong in production, the first job is not philosophical. It is practical:

  1. identify whether the issue is visibility, policy, or forwarding
  2. confirm whether the bad route is local or external
  3. compare views from route collectors and looking glasses
  4. decide whether to withdraw, de-preference, or filter

This sounds procedural because it is. BGP incident response rewards teams that already know:

  • which prefixes matter most
  • which neighbours are allowed to send what
  • how to reduce blast radius quickly
  • where external visibility comes from

Production BGP is inseparable from operational discipline. The protocol itself does not give you fast clarity. Your tooling and preparation do.

28. BGP Is Stable Because Operators Keep It Stable

BGP is often criticised, and much of the criticism is fair. But the protocol's real story is not just its design. It is the ecosystem of practices that grew around it:

  • prefix filtering
  • route collectors
  • communities
  • max-prefix guardrails
  • RPKI validation
  • route-server hygiene
  • years of incident response playbooks

BGP on paper is not enough. BGP in production is the protocol plus decades of defensive operational culture.

29. Internet Exchanges, Peering Fabrics, and Real Interconnection

Much of the internet's practical shape comes from internet exchanges. These are not abstract routing concepts. They are physical places and shared fabrics where networks decide whether and how to exchange traffic.

A large exchange in Amsterdam, Frankfurt, London, Paris, or Milan changes economics dramatically:

  • many peering opportunities in one place
  • lower cost than building separate direct links to everyone
  • reduced dependence on paid transit for some traffic classes

That in turn changes BGP policy. Once an AS has many possible peer paths available at an exchange, local preference, communities, and export policy become even more important because there are more economically distinct ways to reach the same destination.

Public Peering vs Private Interconnect

Not all large traffic relationships stay on a shared peering LAN. Once volume is high enough, networks often build private interconnects. That gives them:

  • more predictable capacity
  • simpler bilateral policy
  • less dependency on a shared exchange fabric

The important point is that BGP policy reflects physical and commercial interconnection design. It is not floating in a vacuum above the wires.

30. A Practical Multihoming Example

Consider a SaaS company in Berlin with ASN 64550 connected to:

  • transit provider A in Frankfurt
  • transit provider B in Amsterdam
  • one public peering exchange

It wants:

  • Frankfurt preferred for most outbound traffic
  • Amsterdam as backup
  • one latency-sensitive service to attract inbound traffic via Frankfurt when possible

The actual configuration strategy might include:

  • higher local preference for provider A
  • selective advertisement of a more-specific prefix toward A
  • communities requesting prepend toward B in some cases
  • monitoring from external route collectors to verify the intended path is actually visible

That is typical BGP work. Not theory, not shortest paths, but careful manipulation of policy tools to get acceptable real-world outcomes from an internet you do not control.

31. Route Leaks Are Usually Boring Mistakes With Huge Blast Radius

The scariest thing about route leaks is how ordinary they usually are. Many are not sophisticated attacks. They are configuration mistakes:

  • a customer exports full routes learned from one upstream to another
  • a peer-learned route is accidentally passed where only customer routes should go
  • a route server policy is wrong

The reason the impact is so large is that BGP is built to distribute accepted information very efficiently. Once a bad advertisement clears the first few policy boundaries, it can travel far.

The defensive posture matters so much because the protocol's weakness is not only malicious trust. It is operational trust. A boring mistake in one network can become everyone else's problem very quickly.

32. BGP Is a Human Coordination System

At internet scale, BGP is not only routing logic. It is a human coordination system expressed through routing logic. Operators publish policy, negotiate peering, define filters, exchange communities, register prefixes, and maintain trust relationships that the protocol then turns into forwarding behaviour.

BGP incidents are so often half technical and half organisational. The packet path is shaped by people, contracts, and operational habits as much as by RFC text.

33. Why It Is So Hard to Replace

Any proposed replacement has to beat BGP not just as an algorithm, but as a deployable social system. It has to support:

  • independent policy
  • partial rollout
  • mixed trust
  • economic asymmetry
  • enormous installed base inertia

That is a brutal requirement set. Many cleaner ideas fail there even if they look better on a whiteboard.

34. Global Routing Is a Continuous Negotiation

The internet's routing table is not a static map. It is a continuous negotiation between networks with different goals, different costs, and different tolerances for risk. BGP works because it allows that negotiation to remain local while still producing a usable global result most of the time.

That is not elegant in the mathematical sense. It is elegant in the political sense, which is why it survived.

35. What BGP Is Really Optimising

BGP is often described as old, insecure, and awkward. All of that is true. It is also one of the few protocols that has successfully scaled across the global internet for decades while letting independent organisations retain control over their own policy.

That is the key tradeoff. BGP does not optimise for elegance. It optimises for autonomy.

Each AS can decide:

  • what to originate
  • what to accept
  • what to prefer
  • what to export
  • which neighbours to trust

The internet works because these local decisions usually align well enough to create global reachability.

When BGP fails, it fails in memorable ways because the trust assumptions are thin and the blast radius can be enormous. But when it works, it quietly solves an extraordinarily difficult coordination problem: it lets thousands of networks with different incentives build one shared routing system without central control.

BGP remains both terrifying and indispensable.

The protocol survives because the internet is not a centrally designed machine. It is a federation, and BGP is one of the few systems flexible enough to let that federation keep arguing while still exchanging packets.

36. What Competent BGP Operations Look Like in Practice

The cleanest way to understand BGP is to watch what careful operators actually do day to day. In mature networks, good BGP operations usually include:

  • prefix filters built from customer intent, not generic trust
  • max-prefix limits on every external session
  • route-origin validation where possible
  • external visibility from route collectors and looking glasses
  • documented community policies that other networks can understand
  • rehearsed rollback procedures for policy mistakes

The point is not perfection. The point is to make failure smaller and more visible.

Here is the deeper lesson: most catastrophic BGP incidents are not caused by the protocol suddenly behaving in a mysterious new way. They happen when ordinary operational safeguards were missing, outdated, or bypassed. A route leak is often just policy without enough boundaries. A hijack that propagates too far is often validation and filtering that were never fully deployed. A convergence problem is often hardware pressure or timer policy exposed by a failure elsewhere.

This is why senior network engineers tend to sound conservative. They are not being timid. They are responding to a system where small mistakes can escape local scope very quickly. The best operators narrow that escape path. They publish what they intend to announce. They register the right objects. They validate peers before trusting them. They use communities consistently. They monitor from outside their own network because internal belief is not the same thing as global visibility.

BGP reliability is largely an operational achievement. The RFCs matter, the attributes matter, the finite-state machine matters, but the internet stays usable because thousands of engineers keep applying discipline to an undisciplined protocol. That is less glamorous than a new algorithm. It is also the reason packets still find their way across a network of networks that nobody fully controls.

37. Why Normal Users End Up Caring About BGP Anyway

Most users will never type the phrase "local preference" or "route reflector," but BGP still reaches them. When a video stream buffers because traffic takes a bad path, when a major cloud region appears unreachable from one ISP but fine from another, when an anycast DNS service suddenly feels distant, BGP is often somewhere in the causal chain.

That is worth stating clearly because routing can sound abstract. It is not abstract when it decides whether your packets exit through a congested interconnect, whether your traffic reaches the closest CDN edge, or whether a route leak drags part of the internet through the wrong network for twenty minutes. BGP policy becomes user experience very quickly.

This is one reason the protocol deserves more respect than it usually gets outside networking circles. BGP is not just "the thing carriers use." It is one of the background systems that determines whether the internet feels local, stable, and boring or fragile, distant, and strange.

38. The Internet Runs on Policy More Than Path Length

A newcomer might assume the global internet mostly routes according to shortest path. BGP is the long-running proof that this is not the right mental model. The internet routes according to policy first and path length only within the boundaries that policy allows.

That sounds inefficient until you remember what problem the protocol is solving. The goal is not to compute one globally optimal graph. The goal is to let independent networks preserve their own commercial, security, and operational constraints while still participating in a shared system. BGP looks awkward because the real world is awkward.

That is the uncomfortable brilliance of BGP. It does not force consensus about what the internet should be. It only forces enough cooperation for packets to continue finding a path through disagreement.

39. Why BGP Rewards Operators Who Are Slightly Paranoid

If there is one healthy instinct in BGP work, it is controlled paranoia. Assume a peer can misannounce something by accident. Assume a route object can be stale. Assume the path you intended to export is not the path the rest of the internet is actually seeing until you verify it.

That mindset is not cynicism. It is the right operating model for a protocol built on partial trust and local policy.

The best BGP teams behave accordingly. They filter narrowly. They document policy. They check outside visibility before and after changes. They prefer boring safety controls over cleverness. Most of the time, that caution buys nothing visible. Then one bad advertisement appears and all the boring work suddenly looks like wisdom.

That is one of the recurring themes in interdomain routing: the operators who look least dramatic are often the ones keeping the largest failures from spreading.

In BGP, caution is not a personality trait. It is part of the job description.