How HTTP/2 Actually Works
Try the interactive lab for this articleTake the quiz (6 questions · ~5 min)HTTP/2 is often explained with one sentence: it multiplexes many requests over one TCP connection. That sentence is true, but it is not enough to explain why the protocol exists, why browsers and servers had to change so much to support it, or why HTTP/3 still replaced it on performance-critical paths a decade later.
HTTP/1.1 already supported persistent connections. It could reuse one TCP connection for many requests. The real problem was that the protocol still treated the connection as a serial byte stream with no native message interleaving. Browsers worked around that by opening several parallel TCP connections per origin, sharding static assets across hostnames, concatenating JavaScript, inlining CSS, and using image sprites to reduce request count. Those were not elegant optimisations. They were coping strategies for a protocol that made concurrency awkward.
HTTP/2 changed the shape of the problem. Instead of sending textual requests and responses directly on the TCP stream, it turned the connection into a framed transport with independent logical streams. Each stream carried one request-response exchange, and the bytes for different streams could be interleaved safely because every frame included a stream identifier. That let the browser fetch HTML, CSS, JavaScript, fonts, and images together without opening six parallel TCP connections to the same origin.
This article looks at the mechanics beneath that change: the binary framing layer, stream lifecycle, HPACK header compression, flow control, stream prioritisation, server push, ALPN negotiation, and the awkward fact that all of this still sits on top of one ordered TCP byte stream. We will also connect those internals to real operator behaviour, browser tuning, CDN deployments in London and Frankfurt, and the specific reasons HTTP/2 improved the web while still keeping one major performance pathology alive.
HTTP/1.1 Was Limited by Message Ordering, Not by a Lack of Reuse
HTTP/1.1 introduced persistent connections and request pipelining, so on paper it already knew how to reuse a TCP session. In practice, pipelining was rarely enabled because responses still had to come back in order. If the browser queued five requests and the first one generated a slow response, the following four were trapped behind it even if the server could have produced them immediately. This was application-layer head-of-line blocking.
A simple example shows the problem. Imagine a browser in Athens requesting:
/index.html/app.css/app.js/logo.svg
On a single HTTP/1.1 connection, the server could not send the bytes for /app.css ahead of /index.html if the browser had pipelined requests and expected ordered responses. Operators therefore defaulted to multiple TCP connections, often six per origin in browsers, so slow resources did not block unrelated ones.
That workaround had costs:
- more TCP handshakes
- more TLS handshakes before TLS 1.3
- more kernel socket state on both sides
- worse congestion behaviour because each connection had its own congestion window
- more pressure on ephemeral ports and middleboxes
The browser community spent years building performance guidance around these limitations. "Domain sharding" spread assets across img.example.eu, static.example.eu, and cdn.example.eu so the browser could legally open more parallel sockets. Bundling reduced request count but made cache invalidation worse. Inlining reduced round trips but bloated the HTML. None of this was ideal. It was just rational behaviour under HTTP/1.1's constraints.
HTTP/2's framing layer was designed to remove the need for these workarounds while preserving the basic HTTP semantics of methods, headers, status codes, URIs, and cache behaviour.
HTTP/2 Starts with a Binary Framing Layer
The most important design shift in HTTP/2 is that HTTP messages are no longer transmitted as plain textual request and response blocks on the wire. Instead, the connection carries a sequence of binary frames. Each frame has a fixed 9-byte header followed by a payload.
Conceptually, a frame looks like this:
+-----------------------------------------------+
| Length (24) | Type (8) | Flags (8) |
+-----------------------------------------------+
| R | Stream Identifier (31) |
+-----------------------------------------------+
| Frame Payload (variable) |
+-----------------------------------------------+Those fields give the protocol the machinery HTTP/1.1 lacked:
Lengthsays how many payload bytes followTypesays what kind of frame this isFlagsrefine behaviour for that frame typeStream Identifiersays which logical stream this frame belongs to
The connection is therefore one byte stream at the TCP layer, but many logical conversations at the HTTP layer. A browser can send request headers on stream 1, request headers on stream 3, response body data on stream 1, and response headers on stream 5, all interleaved safely because every frame states its stream ID explicitly.
This is why HTTP/2 is sometimes described as a framing protocol plus familiar HTTP semantics. The method is still GET, the authority is still the host, the status code is still 200, but those fields now travel in structured frames rather than raw ASCII blocks.
The most common frame types are:
HEADERSfor request or response header blocksDATAfor body bytesSETTINGSfor connection-level parametersWINDOW_UPDATEfor flow control creditPINGfor liveness and RTT measurementRST_STREAMto abort one streamGOAWAYto close the connection gracefully
That frame vocabulary is the real foundation of HTTP/2. Multiplexing is a consequence of the framing model.
Streams Are Independent Logical Channels Inside One Connection
A stream is a bidirectional logical channel inside the shared connection. Every HTTP request-response exchange gets its own stream ID. Clients use odd-numbered stream IDs, servers use even-numbered ones when they initiate streams, which mostly mattered for server push.
The lifecycle is not complicated, but it is important:
- the client opens a stream by sending
HEADERS - either side may send
DATAframes if the message has a body END_STREAMmarks one direction as finished- the stream transitions through open, half-closed, and closed states
For a simple GET, the browser might send:
HEADERS stream=1 END_HEADERS END_STREAMThe server might then reply:
HEADERS stream=1 END_HEADERS
DATA stream=1
DATA stream=1 END_STREAMAt the same time, the browser can already have streams 3, 5, and 7 open for other resources. The server can send bytes from all of them in an interleaved order such as:
HEADERS stream=1
HEADERS stream=3
DATA stream=1
HEADERS stream=5
DATA stream=3
DATA stream=5
DATA stream=1 END_STREAMThat interleaving is the practical performance win. If one response is large, the others do not have to wait for it to finish before the server emits their first bytes.
This model also changes server design. Under HTTP/1.1, one connection often mapped to one in-flight response. Under HTTP/2, one TLS session may contain dozens of concurrent streams, which means the HTTP stack, TLS stack, event loop, and prioritisation logic all need to coordinate correctly. Reverse proxies such as nginx, Envoy, HAProxy, and CDN edge servers had to learn stream scheduling instead of only socket scheduling.
The Connection Begins with a Preface and SETTINGS Exchange
HTTP/2 does not simply start speaking frames without agreement. The client sends a connection preface followed by a SETTINGS frame. For direct cleartext h2c testing, the preface string is literal:
PRI * HTTP/2.0\r\n\r\nSM\r\n\r\nIn ordinary HTTPS use, browsers negotiate HTTP/2 during the TLS handshake using ALPN, Application-Layer Protocol Negotiation. If ALPN agrees on h2, both sides know the encrypted connection will carry HTTP/2 frames.
Immediately after connection setup, each side exchanges SETTINGS. These advertise local preferences and limits such as:
- maximum frame size
- initial flow control window
- maximum concurrent streams
- whether server push is enabled
- header table size for HPACK
That exchange matters because HTTP/2 is full of bounded state. A client does not want an origin to open unlimited streams. A server does not want a client to send arbitrarily large header blocks. Both sides need to know the initial credit for flow control before they start sending much data.
In practice, you can see the negotiation with tools like:
curl -I --http2 https://example.eu
openssl s_client -alpn h2 -connect example.eu:443The ALPN result is one of the easiest ways to confirm whether a site is really serving HTTP/2 at the edge, especially when a CDN terminates TLS in Paris or Frankfurt and speaks some different protocol toward the origin behind it.
HPACK Solved Repetitive Headers Without Reintroducing Compression Side Channels Blindly
Header compression was one of the less visible but highly practical parts of HTTP/2. Web requests carry a lot of repetitive metadata:
:method:scheme:authorityuser-agentacceptcookiecache-control
Under HTTP/1.1, every request sent these as plain text again and again. That wasted bandwidth, especially on high-latency or mobile links where every byte still matters during the first round trips.
HTTP/2 uses HPACK for header compression. HPACK combines:
- a static table of common header names and values
- a dynamic table built during the connection
- Huffman encoding for literal strings
If the browser has already sent :method: GET and :scheme: https, later requests can refer to indexed entries instead of retransmitting the full strings. Repeated cookie prefixes and common response headers also compress well once both sides have shared table state.
A simplified example:
Request 1 headers:
:method: GET
:scheme: https
:authority: assets.example.eu
:path: /app.css
Request 2 headers:
:method: GET
:scheme: https
:authority: assets.example.eu
:path: /app.jsThe second request can mostly reference the first request's entries and transmit only the changed path efficiently.
This reduced overhead is why HTTP/2 helped pages with many small requests. The benefit was not only fewer TCP connections. It was also less repeated header text on each request.
HPACK was also designed in the shadow of earlier compression side-channel concerns such as CRIME. The protocol separates header compression state from message bodies and gives implementations control over dynamic table size so they can limit memory and risk. Even so, operators still had to think carefully about cross-request compression behaviour, especially where attacker-controlled input and secrets could coexist in compressed header contexts.
Flow Control Exists at Both the Stream and Connection Level
Multiplexing introduces a new risk: one sender could overwhelm the receiver with data on one stream and starve memory or buffering for the rest. HTTP/2 therefore includes explicit flow control for DATA frames.
Two independent windows exist:
- a per-stream window
- a connection-wide window
Each side advertises how much DATA it is willing to receive. Sending consumes window credit. Receiving and processing data lets the receiver grant more credit with WINDOW_UPDATE.
Suppose a server starts with an initial window of 65,535 bytes for each stream and for the overall connection. If it sends 16 KB of response body on stream 3, both the stream-level and connection-level windows shrink. Once the client consumes those bytes, it may send WINDOW_UPDATE to enlarge the windows again.
This matters for real workloads:
- a large video segment should not starve small CSS and JavaScript responses
- a slow client should not force unbounded buffering in the server
- one stalled stream should not permanently block the rest if scheduling is sane
Flow control is not congestion control. Congestion control belongs to TCP underneath. HTTP/2 flow control is an application-layer backpressure mechanism. It tells the peer how much data the HTTP stack is ready to accept, not what the network path can sustain.
That distinction is operationally useful. If a transfer is slow, the bottleneck might be:
- TCP congestion window growth
- receiver flow control window exhaustion
- server prioritisation choices
- CDN edge buffering
- origin fetch latency
All of those can produce "slow HTTP/2" symptoms, but they live in different layers.
Prioritisation Looked Powerful on Paper and Messy in Production
HTTP/2 includes a stream prioritisation system. A client can express that one stream depends on another and can assign weights so the server knows which responses should receive bandwidth first. In theory, this lets the browser tell the server that CSS and blocking JavaScript matter more than below-the-fold images.
The model was a dependency tree with weights from 1 to 256. A stream could depend on another stream, optionally exclusively, and the server could schedule bytes accordingly.
A conceptual example:
stream 1: HTML
stream 3: CSS depends on 1, weight 220
stream 5: JS depends on 1, weight 180
stream 7: IMG depends on 1, weight 20In theory, the server should make fast progress on streams 3 and 5 before spending much effort on stream 7.
In practice, prioritisation was one of the least successful parts of HTTP/2. Reasons included:
- browsers changed their prioritisation strategies over time
- many servers and proxies implemented the tree only partially
- some CDNs flattened or ignored priorities under load
- origin fetch delays often dominated whatever clever scheduling the edge wanted to do
Operators discovered that "supports HTTP/2 prioritisation" did not guarantee that the whole delivery chain honoured it meaningfully. A browser in Berlin might send one dependency tree, the CDN edge might simplify it, the reverse proxy might buffer responses differently, and the application server might produce bytes in some unrelated order.
This is one reason modern performance work often focuses more on resource hints, critical CSS, caching, and transport-level latency reduction than on carefully tuning HTTP/2 priority trees.
Server Push Tried to Beat the Browser to the Next Request
Server push was one of HTTP/2's most ambitious features. The idea was simple: if the server knows the HTML response will immediately cause the browser to request /app.css, it can push that resource proactively without waiting for the browser to ask.
Mechanically, the server sends a PUSH_PROMISE on an existing stream, reserving a new stream that represents the pushed request, then follows with the pushed response headers and data.
The model seemed attractive, but it struggled in production:
- the server often guessed wrong about what the browser actually needed
- the browser cache might already hold the asset
- push bandwidth could crowd out more important responses
- CDN and proxy support varied
- debugging was harder than explicit preload hints
A push that saves one round trip in theory can waste bandwidth in practice if the browser already has the object or if the user never renders the route that would have needed it. In an age of strong caching and increasingly sophisticated preload mechanisms, push often did more harm than good.
Browsers and server vendors gradually backed away from it for that reason. HTTP/2 server push is now effectively dead in mainstream web performance engineering. The lesson is useful: not every protocol feature that looks latency-friendly survives contact with real caches, CDNs, and user agents.
HTTP/2 Still Suffers from TCP Head-of-Line Blocking
HTTP/2 solved application-layer head-of-line blocking. It did not solve transport-layer head-of-line blocking.
All streams on one HTTP/2 connection still share one TCP session. TCP delivers bytes in order. If one TCP segment is lost, the receiver cannot pass later bytes up to the application until the missing bytes are retransmitted and delivered in sequence.
This matters because HTTP/2 interleaves frames for many streams on the same TCP byte stream. A lost segment may therefore delay progress for every active stream, not just the one whose data was conceptually "first".
Imagine stream 3 carries CSS and stream 7 carries an image. Their frames are interleaved:
segment A: DATA stream 3
segment B: DATA stream 7
segment C: DATA stream 3If segment B is lost, TCP cannot present segment C to the HTTP/2 layer until B is retransmitted, even though stream 3 is logically independent. This is TCP head-of-line blocking.
That is the central reason HTTP/3 moved HTTP onto QUIC over UDP. QUIC keeps reliability, congestion control, and encryption, but it tracks loss per stream in a way that avoids forcing unrelated streams to wait behind one missing packet.
HTTP/2 was still a major improvement over HTTP/1.1 because it removed the need for many parallel TCP sessions. But on lossy mobile or Wi-Fi paths, one lost TCP segment can still stall the entire multiplexed connection. That was acceptable for a while, not ideal forever.
The Best Performance Gains Usually Came from Fewer Connections and Better Header Efficiency
When HTTP/2 launched, some commentary implied it would make websites automatically fast. That was never realistic. The practical wins came from specific mechanisms:
- fewer TCP and TLS handshakes
- better use of one warm congestion window
- lower header overhead via HPACK
- interleaving of small critical resources
- removal of the need for many HTTP/1.1 workarounds
The biggest improvement often appeared on pages with many small assets. A site serving thirty objects from one origin over HTTPS could eliminate a lot of redundant setup and queuing. Large single-object downloads often changed less because they were already dominated by transfer size rather than request concurrency.
This is also why some old optimisation advice became harmful under HTTP/2:
- domain sharding could reduce efficiency by forcing extra connections
- aggressive bundling could hurt cache granularity
- image sprites became less valuable
Performance teams had to unlearn some habits. A site tuned for HTTP/1.1 might need fewer hostnames and more natural asset separation once HTTP/2 arrived.
In practice, CDNs helped accelerate that transition. An edge platform in Amsterdam or Frankfurt could terminate HTTP/2 for browsers, reuse fewer but hotter origin connections, and hide some complexity from application teams. Many organisations "adopted HTTP/2" first at the CDN edge rather than by redesigning their entire origin stack.
Intermediaries Changed the Meaning of One End-to-End Connection
The web is full of intermediaries:
- browser to CDN edge
- CDN edge to regional shield
- shield to origin proxy
- proxy to application server
HTTP/2 is negotiated hop by hop, not magically end to end across every intermediary. A browser may speak HTTP/2 to a CDN edge in London. That edge may speak HTTP/1.1 to the origin, or HTTP/2, or gRPC over HTTP/2, depending on configuration. The user only sees the first hop.
This matters when people say "our site supports HTTP/2". The statement is true only for a specific segment unless you inspect the whole chain. The browser's experience depends heavily on the client-facing edge hop, so the claim is still useful, but it does not mean the entire backend stack is fully multiplexed.
Operators therefore need to think about buffering and protocol translation carefully. If the edge accepts many browser streams concurrently but serialises origin fetches poorly, the theoretical gain shrinks. If the CDN coalesces requests and caches effectively, the origin may never notice. If the edge proxies gRPC, long-lived streams behave differently again.
HTTP/2 improved the interface between browsers and edge infrastructure first. Backend adoption was more selective and workload-dependent.
HTTP/2 Has Its Own Operational Failure Modes
The protocol is mature, but not trivial. Common operational issues include:
Too Many Concurrent Streams
If clients or load testers open many streams at once, the server's advertised MAX_CONCURRENT_STREAMS becomes important. Too low and the browser queues unnecessarily. Too high and memory pressure rises.
Large Header Blocks
Modern cookies and tracing headers can become enormous. HPACK helps, but implementations still need limits for decompression state and total header size to avoid abuse and memory exhaustion.
Mis-tuned Proxies
Reverse proxies that buffer too aggressively, ignore priorities, or translate inefficiently can erase much of HTTP/2's benefit.
Long-Lived Streams and Fairness
Streaming responses, gRPC workloads, and large downloads can interact badly with smaller latency-sensitive requests if flow control and scheduling are not handled well.
Debugging Complexity
Raw packet captures are harder to eyeball than HTTP/1.1 because the wire format is binary and usually encrypted. Tools such as browser developer panels, nghttp, Envoy stats, and TLS key log files become more important.
Those tools show frame-level behaviour more clearly than ordinary access logs.
A Real Page Load Looks Different Under HTTP/1.1 and HTTP/2
The easiest way to understand HTTP/2's practical benefit is to walk through the same page load under both models.
Imagine a site with:
- one HTML document
- one CSS file
- one JavaScript bundle
- two font files
- six images above and below the fold
Assume the browser is talking to an edge in Frankfurt from Athens and the RTT to that edge is 45 ms. Under HTTP/1.1, the browser might open six TCP connections to the same origin because that is how it avoids application-layer queuing. Each one needs:
- a TCP handshake
- a TLS handshake
- request transmission
- response delivery
Even if the browser pipelines little or nothing, the socket pool has to warm up and the congestion windows have to grow separately. The browser can spread objects across those sockets, but the distribution is approximate. One socket may get the large JavaScript bundle and spend much of its congestion window there, while another carries small font or CSS responses. The result is workable, not elegant.
Under HTTP/2, the browser usually wants one hot connection, sometimes two in corner cases, and then opens many streams on top of it. The setup cost is paid once. Every additional resource mostly adds:
- a
HEADERSframe - some response scheduling on the server
- a stream-level lifecycle
That means the bottleneck shifts from "how many sockets are legal and warmed up" to "how well do both sides schedule frames on one shared connection". The HTML may still dominate discovery because the browser cannot ask for resources it has not parsed yet, but once it knows what is needed, it can burst many dependent requests quickly.
A rough comparison:
HTTP/1.1
conn 1 -> HTML
conn 2 -> CSS
conn 3 -> JS
conn 4 -> font 1
conn 5 -> font 2
conn 6 -> image 1
later reuse -> image 2, 3, 4, 5, 6
HTTP/2
conn 1
stream 1 -> HTML
stream 3 -> CSS
stream 5 -> JS
stream 7 -> font 1
stream 9 -> font 2
stream 11 -> image 1
stream 13 -> image 2
...This matters even more under TLS because handshakes are expensive relative to small objects. HTTP/2 reduces the proportion of page-load time spent on connection setup and leaves more of the budget for actual content transfer. It does not make bad origin latency disappear, but it does stop the browser from wasting so much effort reopening equivalent transport state.
Connection Coalescing Quietly Reduced the Need for Domain Sharding
One of the more subtle browser behaviours that HTTP/2 made possible is connection coalescing. If several origins resolve to the same IP address and the certificate covers them, the browser may reuse one HTTP/2 connection for more than one hostname.
This sounds minor. In practice it helped undo years of HTTP/1.1-era asset sharding habits.
Suppose a site serves:
www.example.eustatic.example.euimg.example.eu
If all three point at the same CDN edge address and the certificate's SAN list covers them, the browser may treat one TLS connection as suitable for all of them. That means one warm socket, one congestion window, and one set of HTTP/2 streams rather than three mostly separate pools.
The constraints matter:
- the certificate must authorise all relevant names
- the server must be authoritative for those names
- the browser must judge the connection reusable under its security rules
When that works, the old reason for deliberate domain sharding weakens sharply. Under HTTP/1.1, sharding was a way to trick the browser into opening more connections. Under HTTP/2, sharding can backfire by splitting work across several congestion windows and reducing the benefits of multiplexing and header compression.
This is one reason performance teams had to revisit long-lived folklore. Advice that was correct in 2013 could become counterproductive later. A stack optimised for HTTP/2 often prefers fewer origins, cleaner certificates, and simpler asset layout rather than a maze of parallel hostnames.
CDNs were especially important here. Edge platforms could serve many customer hostnames from the same anycast fleet and often from the same edge processes. That made coalescing practical and pushed more sites toward simpler delivery topology.
HTTP/2 Changed Backend RPC Design, Not Just Browsers
Public web pages drove the headlines, but HTTP/2 also changed internal service design. Once engineers had:
- multiplexed streams
- binary framing
- bidirectional flow control
- one long-lived TLS connection
it became natural to use the protocol for RPC systems. gRPC is the most obvious example. A client and server can keep one HTTP/2 connection open and run many RPC calls across separate streams, including server streaming and bidirectional streaming patterns.
This works well, but it also exposes parts of HTTP/2 that many website teams never notice. For example:
Long-Lived Streams
A streaming RPC can stay open for minutes or hours. That changes fairness and buffer management compared with short-lived asset requests.
Backpressure
HTTP/2 flow control becomes central. If one consumer is slow, the sender must honour the stream and connection windows rather than buffering endlessly.
Cancellation
RST_STREAM becomes part of normal application behaviour rather than an error oddity. Clients cancel RPC calls, deadlines expire, and the transport needs to tear down only the relevant stream without harming the rest of the connection.
Observability
Socket-level metrics are no longer enough because one connection may hide dozens or hundreds of independent calls. The operator needs stream-level insight.
This is one reason HTTP/2 grew beyond "the browser web protocol". It provided a structured transport that application designers could build on. The lessons learned there also explain why many backend teams care deeply about maximum concurrent streams, per-connection memory limits, and fairness under streaming workloads. Those are not theoretical concerns once the protocol becomes the substrate for service meshes and internal APIs.
HTTP/2 Resource Exhaustion Attacks Taught Operators That Streams Are Cheap, Not Free
When HTTP/2 shipped, some teams assumed that one TCP connection was automatically easier to defend than many. The reality was more nuanced. A multiplexed connection reduces handshake overhead, but it also concentrates a lot of logical work into one transport session. That means per-connection state matters more than before.
The clearest reminder came from the Rapid Reset attack pattern disclosed in 2023. The short version is that a client could open streams and cancel them very quickly with RST_STREAM, forcing servers or proxies to do significant stream setup and teardown work at high rates. The attack did not require extraordinary bandwidth. It exploited the fact that stream lifecycle handling still consumes CPU and memory.
The important lesson is not the exploit detail. The important lesson is architectural:
- stream creation is not free
- header decompression is not free
- scheduler bookkeeping is not free
- cancellations can still trigger useful work on the server side
A robust HTTP/2 stack therefore needs:
- sane stream concurrency limits
- fast cancellation paths
- defensive header size limits
- fair scheduler behaviour under abuse
- edge filtering that recognises pathological stream churn
This is another place where intermediaries matter. A CDN or reverse proxy can absorb or reject malicious behaviour before it reaches the origin, but only if the intermediary's own HTTP/2 implementation is hardened as well.
The broader point is that HTTP/2 removed some costs and introduced new ones. It made application concurrency cleaner. It also created new places where cheap logical actions can trigger non-trivial state transitions. Mature deployments account for both facts.
Browser Scheduling Still Matters Because the Wire Is Not the Whole Story
A common mistake is to treat HTTP/2 performance as purely a server concern. The browser still decides:
- which requests to issue first
- when discovered resources are high priority
- whether a preconnect or preload happens
- whether a resource is render-blocking
- which origin socket pool to reuse
HTTP/2 gave browsers more flexibility. It did not remove the need to spend that flexibility wisely.
Consider fonts. If the browser discovers a font late because the CSS arrived late, multiplexing alone will not save first render. Consider images. If the browser decides below-the-fold images are low priority, the server may never need to spend bytes on them early even though multiplexing would allow it. Consider service workers. Cached or intercepted resources may not hit the network at all, which changes the apparent value of transport-level optimisation.
This is why browser network panels remain so useful. They show that HTTP/2 is one layer in a wider scheduling system. If the waterfall still looks bad after enabling h2, the problem may be:
- discovery order
- origin compute delay
- cache misses
- render-blocking CSS
- oversized JavaScript
HTTP/2 helps the transport path. It does not redesign the page architecture for you.
Browsers, CDNs, and Origins All Adapted Their Strategies Around HTTP/2
Browsers became more willing to keep one hot connection per origin and open streams opportunistically. CDNs tuned edge stacks to multiplex efficiently and terminate many browser sessions at scale. Origin operators revisited old asset strategies and sometimes discovered that their HTTP/1.1-era sharding and bundling rules now made performance worse.
The most successful HTTP/2 deployments usually shared a few characteristics:
- ALPN and TLS configuration were correct and modern
- CDN or reverse proxy support was mature
- domain sharding was reduced
- cache policy was clean so repeat requests stayed at the edge
- large headers and cookies were kept under control
Put differently, HTTP/2 worked best when the organisation treated it as part of an end-to-end delivery design, not as a box to tick in a TLS configuration template.
Debugging HTTP/2 in Production Usually Means Correlating Several Views at Once
HTTP/1.1 problems were often easier to spot from plain text request logs and packet captures. HTTP/2 is more layered. The useful debugging workflow usually combines:
- browser waterfall or netlog
- edge proxy stream counters
- origin latency metrics
- TLS negotiation data
- packet captures only when necessary
Typical questions include:
- did the client negotiate
h2at all - how many streams were open concurrently
- did one stream monopolise the connection
- were flow-control windows exhausted
- did a proxy downgrade or buffer oddly between edge and origin
For example, a user may report that CSS arrives late only on one network path. The browser panel may show the HTTP/2 connection is healthy, but the origin metrics may reveal that the CDN shield is serialising origin fetches under pressure. Another case may show good origin timings but a bad client experience because one lossy LTE path triggers TCP-level stalls on the shared connection. The HTTP layer and transport layer both have to be inspected.
This is the other reason HTTP/2 never became invisible infrastructure. It improved things enough that most users stopped thinking about it, but the engineers running large services still need to reason about frame scheduling, header pressure, and loss on one shared transport. That is a better problem set than HTTP/1.1 gave them, not a trivial one.
Not All Frame Types Matter Equally, but the Less Common Ones Still Shape Behaviour
Most explanations of HTTP/2 focus on HEADERS and DATA because those carry the visible request and response. In practice, several less glamorous frame types shape connection health and performance:
SETTINGS
This frame defines the local operating envelope. If one side allows only a small number of concurrent streams or advertises a small initial flow-control window, the entire connection behaves differently from the beginning.
WINDOW_UPDATE
Without these frames, high-throughput transfers would stall quickly. They are the credit-replenishment mechanism that lets the sender continue once the receiver has processed enough data.
RST_STREAM
Cancellation is normal in browsers. Users navigate away, speculative requests become unnecessary, and preloaded resources lose value. RST_STREAM lets the peer stop one logical exchange without tearing down the entire connection.
GOAWAY
This is the graceful connection shutdown signal. It tells the peer the highest stream ID that might have been processed and prevents needless retries on streams that were never accepted.
PING
Operators use this for liveness and RTT measurement at the HTTP/2 layer. It does not replace TCP keepalives or network telemetry, but it is often useful in long-lived application sessions.
Understanding these frames helps explain production behaviour. For example, graceful deploys at a reverse proxy often involve sending GOAWAY so existing streams can finish while new streams migrate elsewhere. Similarly, a high rate of RST_STREAM events in logs may not mean failure. It may mean a browser is reprioritising aggressively or a gRPC client is timing out calls quickly.
The frame model also explains why HTTP/2 implementations need careful state accounting. Even if the payload volume is low, the control-plane churn of streams opening, updating windows, being cancelled, and shutting down can still be significant.
Flow Control Becomes Easier to See with a Concrete Example
Flow control feels abstract until you put numbers on it. Suppose a server is sending:
- a 1.8 MB JavaScript bundle on stream 5
- a 24 KB CSS file on stream 3
- a 12 KB font manifest on stream 7
Assume the client's initial stream window is 65,535 bytes and the connection window is also 65,535 bytes.
If the server starts by pouring data into stream 5 only, it can exhaust much of the available credit quickly. Until the client processes some bytes and sends WINDOW_UPDATE, the server's ability to send more data on any stream can shrink because the shared connection-level window is being consumed too.
That creates a subtle but important scheduling consequence. A sender that cares about latency should not dump all available credit into the largest object first. It should send enough to keep the pipe busy while still leaving room for smaller critical streams.
Flow control and prioritisation were always linked in practice for that reason even though they are distinct mechanisms in the spec. The sender has to decide not only what it is allowed to send, but what it should send with the currently available credit.
This is also why a badly implemented HTTP/2 stack can feel worse than HTTP/1.1 in edge cases. If the stack fills the connection window with one bulky stream and delays WINDOW_UPDATE processing poorly, small important responses can feel strangely sluggish even though multiplexing exists on paper. The protocol permits good behaviour. The implementation has to deliver it.
HTTP/2 Changed How People Think About "One Connection per Origin"
Before HTTP/2, one connection per origin sounded conservative. During HTTP/2 adoption, it started to sound efficient. In the HTTP/3 era, teams often run both and have to think carefully about protocol mix.
This matters because one origin may now expose:
- HTTP/1.1 for very old clients
- HTTP/2 for most HTTPS traffic over TCP
- HTTP/3 for clients that support QUIC and prefer it
That means "connection strategy" is no longer one static rule. Browsers and edge systems decide dynamically based on:
- ALPN results
- previous protocol success
- network conditions
- certificate and origin coalescing rules
- whether QUIC is blocked or degraded
HTTP/2 therefore became part of a broader lesson for transport engineers: the web works best when the client can keep a small number of hot, high-quality paths rather than constantly reopening new ones. That lesson outlived HTTP/2 itself and carried directly into QUIC design.
Many HTTP/1.1-Era Frontend Tricks Became Technical Debt Under HTTP/2
When a site migrates to HTTP/2, the old performance hacks do not disappear automatically. They often linger:
- giant JavaScript bundles justified by "fewer requests"
- CSS merged into monoliths
- image sprites that are awkward to maintain
- many sharded asset hostnames
- brittle preload strategies designed around limited socket counts
These were rational optimisations once. Under HTTP/2 they can become liabilities.
A 2 MB JavaScript bundle that once saved ten request setups may now delay parsing, caching, and execution far more than it helps transport efficiency. A sprite sheet that once reduced request overhead may now hurt rendering and cache granularity. A thicket of static hostnames may defeat connection reuse and complicate certificate management.
This is one reason real HTTP/2 migrations sometimes disappointed teams at first. They enabled h2, saw only modest improvement, and concluded the protocol was overhyped. In reality, the application was still shaped by assumptions from an earlier transport world.
The transport can remove one class of bottleneck. The asset and page design still need to meet it halfway.
Why HTTP/2 Never Used Cleartext h2c on the Public Web
The spec allows a cleartext variant, usually called h2c, where HTTP/2 runs without TLS. In principle a client and server can upgrade from HTTP/1.1 or start with prior knowledge. In practice the public web almost never adopted that path.
There were several reasons.
First, HTTPS became the default expectation for almost every serious website. Browsers, search engines, and platform vendors steadily pushed the web toward encryption not only for privacy, but also for integrity. Once HTTPS was the baseline, the practical deployment path for HTTP/2 naturally ran through ALPN inside TLS.
Second, intermediaries were already difficult enough. Adding another public upgrade path with mixed cleartext behaviour created little benefit compared with the simplicity of "if it is modern and public, it is almost certainly HTTPS with ALPN".
Third, the biggest visible gains from HTTP/2 on the web were closely tied to encrypted transport anyway. Reducing repeated TLS setup, coalescing hot secure connections, and using one well-managed browser socket all fit naturally into the HTTPS model that the web was already converging on.
h2c survives mostly in specialised internal environments, test setups, and some proxy-to-proxy use cases rather than on ordinary sites for that reason. The important lesson is that HTTP/2 did not succeed as a generic framing layer in the abstract. It succeeded as the secure default transport for modern HTTPS delivery. The protocol and the web's wider security posture moved together.
A Concrete Frame Walkthrough Makes the Wire Format Easier to Trust
The binary framing layer sounds abstract until you watch one simple request-response exchange at frame level.
Suppose a browser requests:
GET /app.css HTTP/2
Host: static.example.euOn the wire, the browser does not send those textual lines as an HTTP/1.1 block. It sends a HEADERS frame on a new stream. The pseudo-headers and ordinary headers are HPACK-encoded into a header block fragment, and the frame header declares:
- type =
HEADERS - stream ID =
3 - flags =
END_HEADERS | END_STREAM
The browser can set END_STREAM because a simple GET has no request body.
The server may answer with:
HEADERSstream 3, status200DATAstream 3, first bytes of CSSDATAstream 3, remaining bytes withEND_STREAM
If another object is needed at the same time, the server can insert its frames between those DATA frames. So a connection transcript might look like:
C -> S HEADERS stream=1 GET /index.html
C -> S HEADERS stream=3 GET /app.css
C -> S HEADERS stream=5 GET /app.js
S -> C HEADERS stream=1 :status 200
S -> C DATA stream=1 "<!doctype html>..."
S -> C HEADERS stream=3 :status 200
S -> C DATA stream=3 "body{...}"
S -> C HEADERS stream=5 :status 200
S -> C DATA stream=5 "(()=>{...})"That one example explains most of the protocol's practical value. The browser can issue related requests immediately, and the server can begin each response as soon as it has useful bytes rather than waiting for the earlier response to complete fully.
This is also why observability tools that can decode frames are so valuable. Ordinary access logs tell you that the requests existed. Frame-aware tools tell you how the connection actually scheduled them.
SETTINGS Values Quietly Change the Feel of a Connection
The SETTINGS exchange rarely appears in casual discussions of HTTP/2, but small changes there can reshape performance dramatically.
Consider three especially important settings:
SETTINGS_MAX_CONCURRENT_STREAMS
This caps how many streams the peer should have open at once. If the server advertises a very low value, the browser is forced back into a more serial pattern and loses much of the protocol's concurrency benefit.
SETTINGS_INITIAL_WINDOW_SIZE
This controls how much DATA the sender may transmit on each stream before flow-control credit runs out. Too small and transfers become chatty and stop-start. Too large and buffering pressure rises.
SETTINGS_HEADER_TABLE_SIZE
This controls HPACK dynamic table memory. A larger table can improve compression for repetitive headers, but also consumes more state and potentially increases pressure under abuse.
These settings are one reason two sites can both "support HTTP/2" and still behave quite differently. The protocol version is the same. The local operating envelope is not.
Operators often discover this while load-testing APIs or media delivery. A reverse proxy with conservative defaults may be perfectly safe, but it can also force avoidable queuing or stop-and-go flow-control patterns. The fix is not "enable HTTP/2 harder". The fix is tuning the actual frame-level limits the peer experiences.
This is one more reason HTTP/2 is not merely a browser feature. It is a transport behaviour surface with parameters that infrastructure teams have to understand intentionally.
Error Handling Became More Precise Because One Bad Stream Should Not Kill the Whole Session
One overlooked improvement in HTTP/2 is that it separates stream errors from connection errors much more cleanly.
Under HTTP/1.1, a malformed or failed response often had ugly consequences because there was not much structure beyond the connection itself. Under HTTP/2, the protocol can say:
- this stream is bad, reset it
- this header block is too large, reject it
- this flow-control rule was violated, treat it as a connection problem
That precision matters operationally. A cancelled image fetch should not kill the CSS response and the analytics request. A malformed stream should usually die alone unless it indicates the peer is seriously misbehaving.
Some common error scenarios:
Stream Reset
If the client no longer needs a resource, it can send RST_STREAM. The rest of the connection stays alive.
Graceful Shutdown
If the server is draining for deploy, it can send GOAWAY to stop new streams while finishing existing ones.
Protocol Violation
If the peer breaks framing or flow-control rules badly enough, the connection may have to close because the shared state can no longer be trusted.
This is part of why modern proxies and RPC stacks liked HTTP/2. They gained a more expressive failure model than "socket alive or socket dead". That makes retries, cancellation, graceful deploys, and long-lived sessions more manageable.
The Browser Waterfall Still Reflects Discovery Order, Dependency Chains, and Main-Thread Work
It is easy to give HTTP/2 too much credit for page-load outcomes. Even after the transport improves, the browser still has to:
- parse HTML to discover subresources
- parse CSS to discover fonts and images
- execute JavaScript that may trigger additional requests
- schedule rendering on the main thread
This means a page can negotiate HTTP/2 perfectly and still feel slow if:
- CSS arrives late and blocks first paint
- JavaScript is huge and monopolises the main thread
- lazy loading is configured badly
- the HTML itself is delayed by backend compute
The useful mental model is that HTTP/2 shrinks one important set of transport bottlenecks. It does not erase the rest of the browser's dependency graph.
For example, a site may improve from 1.8 seconds to 1.2 seconds first-contentful paint after moving from HTTP/1.1 sharded delivery to clean HTTP/2, but still remain slower than it should because:
- the CSS is render-blocking and oversized
- the font strategy causes layout instability
- the origin HTML response is slow
Transport and application structure therefore have to be evaluated together. The README standard for this site is right to demand concrete detail here, because this is exactly where shallow protocol explanations become misleading. A faster wire format is helpful. It is not a substitute for good page architecture.
HTTP/2 Matters Today Mostly as the Baseline That Replaced HTTP/1.1 on the Public Web
HTTP/3 gets attention now, but HTTP/2 is still the operational baseline for much of the encrypted web. It cleaned up years of awkward client behaviour, made one-connection delivery practical, and taught the ecosystem how to build framed, multiplexed HTTP stacks.
Its limitations are also instructive. HTTP/2 showed that fixing message ordering above TCP helped a lot, but not enough to eliminate transport-level coupling. That experience directly informed QUIC and HTTP/3.
If you keep one mental model, use this one: HTTP/2 turns one TCP connection into many framed logical streams, compresses repetitive headers, adds explicit flow control, and lets the client and server interleave work far more efficiently than HTTP/1.1. It solves application-layer queuing. It does not solve TCP's in-order delivery rules.
That is how HTTP/2 actually works.