How USB Actually Works, From Enumeration To Thunderbolt
Try the interactive lab for this articleTake the quiz (6 questions · ~5 min)Plug a USB stick into a laptop in a Berlin office and a small file manager window opens a second or two later. It feels like nothing happened. The stick sits there, the laptop notices, a volume is mounted, and you drag a file across. Every part of that experience is the product of an almost absurd amount of engineering, accumulated over nearly three decades of revisions, stretching from the original 1.5 Mbps USB 1.0 spec in 1996 all the way to the 80 Gbps USB4 v2 parts shipping on recent Thunderbolt docks.
USB is deceptive because it is layered. The layers are tightly integrated and normally invisible, but they are very real. There is a physical layer that changed three times (single-ended, differential, multi-lane serdes). A link layer with framing, addressing, and CRCs. A transaction layer that turns byte streams into reliable packets and back again. A device model with descriptors, configurations, interfaces, endpoints, and alternate settings. A class system that lets the kernel load the right driver for a random device without having seen it before. A power delivery sideband that negotiates voltage levels up to 48 volts. And in the last few years, an entire alternate-mode mechanism that lets DisplayPort, PCIe, or Thunderbolt share the same connector with regular USB. Almost all of this is hidden from the application layer, which just calls read() or libusb_bulk_transfer() and expects data to arrive.
This article walks through the full stack, from the pins in the plug to the moment your file manager pops open. The goal is not to memorise the spec, which is thousands of pages long across USB 2.0, USB 3.2, USB4, and USB Power Delivery, but to see how the pieces fit together so that the behaviour of real devices stops feeling like magic. By the end it should be clear why a USB enumeration "just works", why it sometimes takes several seconds, why USB-C alt modes are fundamentally different from a regular USB connection, and why Thunderbolt is not really USB even though it uses the same port.
The Physical Layer: Three Generations Stacked In One Cable
The USB connector you plug in today is mechanical and electrical at the same time. The USB-A plug that dominated for twenty years carried four pins: VBUS (5 volts), GND, D+, and D-. The two data lines were a differential pair used for a single half-duplex bidirectional channel, running at 1.5, 12, or 480 Mbps depending on the generation. That differential pair is what we now call the "USB 2.0 link", and every USB cable on earth still carries it for backwards compatibility.
When USB 3.0 added 5 Gbps SuperSpeed in 2008, it did not replace the USB 2.0 link. It added two new differential pairs, SSTX and SSRX, on separate pins of the connector. The old USB 2.0 pair kept running alongside them as an independent, lower-speed bus for legacy devices and for control traffic. A USB 3.0 device is really two devices in a trench coat: a classic USB 2.0 interface and a SuperSpeed interface, sharing a descriptor tree but not sharing a wire. This is why you can plug a USB 3.0 stick into a USB 2.0 port and it still works, just slower. The USB 2.0 pair is present on every port, on every cable, and handles everything until the link negotiates up to SuperSpeed.
USB 3.1 Gen 2 (10 Gbps) and USB 3.2 Gen 2x2 (20 Gbps) added more lanes and higher line rates but did not change the basic structure. Then USB4 arrived and rewrote everything. USB4 uses the same USB-C connector but runs a completely different protocol stack underneath, derived from Thunderbolt 3. The two SuperSpeed pairs become two high-speed lanes running at 10, 20, or 40 Gbps each, and the link carries a tunnelled mix of USB 3.2 traffic, DisplayPort, and PCIe over a shared transport layer. USB4 v2, published in 2022, pushed each lane to 40 Gbps asymmetric or 80 Gbps symmetric using PAM3 signalling. At that speed the cable is a carefully tuned transmission line, the SerDes blocks on both ends run adaptive equalisation, and you cannot just extend with a cheap cable without active retimers.
The USB-C connector itself is what made all of this possible without exploding the number of ports on a laptop. It has 24 pins in total: four pairs of high-speed lanes (two for TX and two for RX on each side of the plug), a USB 2.0 pair (still carried through, with only one side actually connected because the plug is reversible), a configuration channel (CC), a sideband use (SBU) pair, and multiple VBUS and GND pins. The physical symmetry of the plug is why USB-C is reversible: whichever way you insert it, the same pin layout appears to the host, and the CC pin tells the host which orientation was used. The host then routes the high-speed lanes internally to match.
The CC pin is where a lot of the USB-C cleverness lives. It carries a low-speed BMC (Biphase Mark Coding) signal for USB Power Delivery negotiation, it identifies the cable (active or passive, rated for how many amps, what electronic marker chip if any), and it tells both ends which of them is the source and which is the sink. Without the CC pin doing its job the port does nothing, because the host has no idea a device is even attached.
Enumeration: The Dance Before The Dance
What actually happens when you plug a USB stick into a laptop? Contrary to what it looks like, the sequence is elaborate and strictly ordered, and it takes somewhere between 50 milliseconds and several seconds depending on the device.
The host controller is polling the port, looking for a change in the CC pin voltage (on USB-C) or the D+/D- termination (on older connectors). On USB 2.0, a device indicates its presence by pulling one of the data lines high through a 1.5 kΩ resistor. Pulling D+ high means full-speed (12 Mbps), pulling D- high means low-speed (1.5 Mbps). The host sees this and knows something attached.
The host then asserts a bus reset by pulling both data lines low for at least 10 milliseconds. The device sees the reset and transitions into a known state: address 0, endpoint 0 active, default control pipe ready. Every device starts at address 0, regardless of how many other devices are already on the bus, which is why only one device can be reset at a time.
After the reset, the host sends a Get Descriptor request on endpoint 0 for the device descriptor. This is a control transfer, which has three stages: a setup packet carrying the request, one or more data packets carrying the response, and a status packet acknowledging completion. The device descriptor is 18 bytes long and contains the USB version the device supports, its vendor ID (VID), its product ID (PID), its device class, the max packet size for endpoint 0, and the number of possible configurations. On USB 2.0 the host usually only asks for the first 8 bytes at this stage, just enough to learn the max packet size, and then issues another reset.
After the second reset, the host assigns an address to the device with a Set Address control transfer. The device latches the new address on the status stage of that transfer and starts ignoring address 0 from that point on. All subsequent communication uses the new address.
The host now reads the full device descriptor, then the configuration descriptors. Configuration descriptors are nested: a top-level configuration descriptor lists one or more interfaces, each interface lists one or more endpoints (with their direction, type, max packet size, and polling interval), and there may be class-specific descriptors embedded at each level. A single Get Descriptor call returns the entire tree as a flat byte blob, and the host parses it in memory.
Once the host has the descriptors, it picks a configuration with a Set Configuration control transfer. A device may support multiple configurations (for example, one that draws less power if bus power is limited), but almost all devices only advertise one. Setting a configuration activates the interfaces and endpoints, and the device is now ready to handle normal I/O.
On the host side, the kernel now has a complete map of the device's capabilities. It walks the descriptors, matches each interface against its driver table by class, subclass, and protocol code, and loads the matching class driver. A mass storage device matches class 0x08, subclass 0x06 (SCSI transparent command set), protocol 0x50 (bulk-only transport), and the kernel loads the usb-storage driver. An HID keyboard matches class 0x03, subclass 0x01 (boot interface), protocol 0x01 (keyboard), and the kernel loads usbhid. If no matching driver exists, the device sits unclaimed until a userspace program (like libusb) grabs it with a claim interface call.
The whole sequence is repeated for SuperSpeed if the device is USB 3.0 or higher, on the separate SuperSpeed link, using a similar but slightly different control transfer flow. SuperSpeed enumeration uses link training at the physical layer first (which itself takes tens of milliseconds while the two ends negotiate timings and equalisation), followed by the same descriptor dance at the logical layer.
For a typical USB-C flash drive, the visible delay between insertion and "drive ready to read" is dominated by three things: USB-C orientation detection and CC negotiation (around 20 to 100 milliseconds), SuperSpeed link training (another 50 to 150 milliseconds), and the enumeration dance itself (perhaps 30 to 100 milliseconds). Then the mass storage driver has to issue a SCSI INQUIRY, a READ CAPACITY, and a TEST UNIT READY before it knows the device is responsive, and the block layer has to read the partition table and mount the filesystem. Total: often well under half a second, sometimes a few seconds on a slow device.
The Device Model: Descriptors All The Way Down
The descriptor tree is the heart of how USB is self-describing. Every USB device is required to expose a structured set of descriptors through endpoint 0, and the host uses those descriptors to figure out what it is looking at without any prior knowledge.
At the top is the device descriptor. It contains:
bcdUSB: which USB version the device claims to support (0x0200, 0x0300, 0x0310, etc.).bDeviceClass,bDeviceSubClass,bDeviceProtocol: either a specific device class, or 0x00 meaning "see the interface descriptor for the class", or 0xEF meaning "miscellaneous, use interface association descriptors".bMaxPacketSize0: the maximum packet size for endpoint 0. For high-speed devices this must be 64. For SuperSpeed it must be 512.idVendorandidProduct: the VID and PID that identify the device.bcdDevice: the device firmware revision.iManufacturer,iProduct,iSerialNumber: indices into a string descriptor table.bNumConfigurations: how many configurations this device supports.
Below the device descriptor comes the configuration descriptor, which describes one possible operating configuration. Its key fields are bNumInterfaces (how many interfaces this configuration exposes), bmAttributes (self-powered versus bus-powered, remote wake support), and bMaxPower (in 2 mA units for USB 2.0, 8 mA units for SuperSpeed).
Inside the configuration, one or more interface descriptors describe logical functions the device provides. An interface has its own class, subclass, and protocol codes, an alternate setting number, and a count of endpoints. Interfaces are important because a single device can expose several at once. A USB webcam typically exposes a video streaming interface, an audio interface for the microphone, and an HID interface for the buttons, all on the same device, and each interface gets its own kernel driver.
Below each interface, the endpoint descriptors list the pipes the interface uses for actual data transfer. An endpoint has:
bEndpointAddress: the endpoint number and direction. Bit 7 is set for IN endpoints (device to host).bmAttributes: the transfer type (Control, Isochronous, Bulk, or Interrupt).wMaxPacketSize: the max packet size.bInterval: the polling interval for interrupt and isochronous endpoints.
The four transfer types are the core of USB data flow. Control transfers are used for device enumeration and configuration, and for small command traffic. They are bidirectional and reliable, but they share bus time with everything else and are capped at a small fraction of bandwidth. Isochronous transfers are used for streaming audio and video: they guarantee fixed bandwidth and fixed timing but have no retransmission on error, so a dropped packet is just lost. Bulk transfers are used for large, bursty, reliable data: mass storage, printers, network adapters. They get whatever bus bandwidth is left over and are retransmitted on error. Interrupt transfers are used for small periodic data: keyboards, mice, status polling on other devices. They guarantee a worst-case latency by reserving bus time at a specified interval.
The host controller schedules all of this within a strict frame structure. On USB 2.0, the bus is divided into 1 millisecond frames, subdivided into eight 125 microsecond microframes at high speed. The host allocates a time budget for isochronous and interrupt transfers first (they are latency-sensitive), then fills the remaining slots with bulk and control transfers on a best-effort basis. This is how USB can carry high-bandwidth bulk traffic and real-time audio on the same bus without either interfering with the other: the real-time traffic is guaranteed its slots by the scheduler.
On SuperSpeed the frame structure changes: it becomes a bus interval of 125 microseconds, with asynchronous notifications replacing the constant host polling of USB 2.0. The host no longer polls endpoints to check for data; instead, devices send an ERDY (Endpoint Ready) packet when they have data, and the host issues transactions in response. This cuts the power overhead of SuperSpeed enormously, because a SuperSpeed device can sit in a low-power link state until something interesting happens.
Host-Controlled, And Why It Matters
A point that is easy to miss in the descriptor discussion: USB is a strictly host-controlled bus. Devices cannot talk unless the host asks them to. Every transaction on the wire starts with a token packet from the host telling a specific endpoint to either send or receive data. Devices can never spontaneously initiate a transfer.
This design choice has enormous consequences. It simplifies the hardware: a USB device only needs one side of a transaction state machine, not a full bus arbiter. It eliminates collisions: there is only one host speaking at a time, and devices only respond to direct requests. It lets the host guarantee bandwidth allocation: if the host has pre-scheduled an isochronous slot for the webcam, no other device can barge into it, because no other device can transmit without being asked. And it is why a USB device cannot "wake up" the host in the traditional sense; the best it can do is assert a wake signal on the bus that the host interprets as a request to leave suspend.
The downside is that every endpoint has to be polled. For interrupt endpoints on USB 2.0 (like a keyboard), the host polls every 10 milliseconds by default, which wastes bus bandwidth on empty transactions most of the time. SuperSpeed fixed this with the ERDY mechanism, but the basic host-controlled model remains.
This also explains why USB is not symmetric between host and device roles. A device cannot become a host, and a host cannot become a device, unless the hardware specifically supports dual role. USB On-The-Go (OTG), introduced for USB 2.0, allowed certain devices (notably smartphones) to switch between host and device modes with a protocol called Host Negotiation Protocol (HNP) over the ID pin on the micro-USB connector. On USB-C, dual role is much cleaner: the CC pin negotiates which end is source and which is sink, and the two ends can swap roles with a Power Role Swap or Data Role Swap message at any time.
Classes: Plug In Anything, It Just Works
The reason a random USB device you have never seen before just works is the USB class system. Classes are standardised behaviour profiles that tell the host "if you treat me according to class X, everything will be fine, and you do not need a vendor-specific driver".
The most important classes in practice are:
-
HID (0x03): Human Interface Device. Keyboards, mice, game controllers, touchscreens. HID uses a structured report format: the device describes its own report layout with a HID report descriptor, and the host parses that descriptor to understand the semantics of each byte in an incoming interrupt transfer. This is why a brand-new exotic gamepad can work without a driver: its report descriptor tells the kernel "byte 0 bits 0-3 is the D-pad, byte 1 is analogue X, byte 2 is analogue Y", and the kernel forwards events to the input subsystem.
-
Mass Storage (0x08). USB flash drives, external hard drives, SSDs. Mass storage devices tunnel SCSI commands over USB bulk endpoints, so the host sends a SCSI Command Block Wrapper on an OUT bulk endpoint, the device responds with data on an IN bulk endpoint, and closes with a Command Status Wrapper. The kernel's SCSI layer sees the exact same command set it would send to a SATA drive. This is why mounting a USB stick uses the same
sddriver as a real SATA disk. -
CDC (0x02): Communications Device Class. Modems, network adapters, serial ports. CDC has a large number of subclasses for different communication styles: CDC-ACM (Abstract Control Model) for virtual serial ports, CDC-ECM (Ethernet Control Model) for Ethernet adapters, CDC-NCM (Network Control Model) for gigabit-class Ethernet, and so on. CDC-ACM is why Arduino boards show up as
/dev/ttyACM0on Linux without any driver work. -
Audio (0x01). USB audio interfaces, headsets, microphones. The Audio class uses isochronous endpoints for PCM sample streaming, with feedback endpoints that let the host fine-tune the sample rate to compensate for clock drift between host and device.
-
Video (0x0E). UVC, the USB Video Class. Webcams. UVC devices stream compressed or uncompressed video over isochronous endpoints and describe their supported formats and frame rates through video-class descriptors. This is why almost every USB webcam works on Linux, macOS, and Windows without a vendor driver.
-
Printer (0x07), Smart Card (0x0B), Billboard (0x11), DFU (0xFE, subclass 0x01). A long tail of smaller classes, each solving a specific problem. Billboard, for example, is purely informational: it exists to let a device advertise which alternate modes a USB-C port supports, so the host can present an error message if the user plugs in an incompatible accessory.
Classes are what makes USB usable as a universal bus. Without them, every new device would require a vendor driver and every operating system would need to ship a pile of kernel code for every peripheral ever made. With them, the host can walk up to a device it has never heard of, see class 0x08 subclass 0x06 on the interface descriptor, and just load the mass storage driver. It works even on operating systems the device manufacturer never tested.
There is a limit to this, of course. Some devices need vendor-specific behaviour that no class captures, and those use vendor-defined interface classes (class 0xFF) with their own protocol. Many game controllers, many printers, many exotic peripherals fall into this category and need a matching userspace or kernel driver. But the majority of ordinary devices plug into the class-based driver stack and just work.
USB Power Delivery: From 2.5 Watts To 240 Watts
Original USB provided 5 volts at 100 mA, which is 0.5 watts, barely enough for a low-power keyboard. USB 2.0 raised it to 500 mA (2.5 watts). USB 3.0 bumped it again to 900 mA (4.5 watts). Then USB Battery Charging 1.2 allowed up to 1.5 A for dedicated chargers. All of this was still at the fixed 5-volt rail.
USB Power Delivery (USB PD), introduced as an optional spec in 2012 and standard on USB-C, changed the game. USB PD is a two-way negotiation between source and sink that can reach 100 watts on USB-C cables rated for 5 A (USB PD 2.0) and 240 watts on USB-C cables that support Extended Power Range (EPR) at 48 volts (USB PD 3.1, ratified in 2021). This is how a single USB-C cable can power a 16-inch laptop with a discrete GPU, or fast-charge a phone from empty to 50 percent in fifteen minutes.
Power Delivery negotiation happens on the CC pin, using BMC-encoded packets. The basic flow:
-
The source advertises its capabilities by periodically sending a Source_Capabilities message listing every Power Data Object (PDO) it supports. A PDO is a tuple of voltage, maximum current, and type (fixed, battery, variable, or PPS for programmable power). A modern 100 W charger typically advertises 5 V at 3 A, 9 V at 3 A, 15 V at 3 A, and 20 V at 5 A, plus a PPS range for fine-grained control.
-
The sink replies with a Request message selecting one of those PDOs and specifying the exact current it wants. A laptop might request "PDO 4, 5 A operating, 5 A maximum". The source either accepts, in which case it transitions to that voltage and confirms with an Accept and then PS_Rdy, or it rejects the request.
-
The sink can renegotiate at any time. If the laptop's battery fills up, it might drop the request to 15 V at 2 A to reduce losses. If the user plugs in a second high-power accessory, the sink might request a higher PDO. The source tracks the state machine and keeps both sides in sync.
There are several subtleties. The cable itself has to be capable of the requested power level, which is why high-current USB-C cables contain a small electronic marker chip (E-Marker) on the CC line. The E-Marker identifies the cable's maximum current, its length, its shielding characteristics, and any signal integrity properties the source and sink need to know about. If you plug in a cheap cable without an E-Marker, the source will fall back to 3 A maximum regardless of its own capability, because it cannot trust the cable to carry more.
PPS (Programmable Power Supply) mode adds another twist: instead of stepping through discrete voltages, the sink can request any voltage in 20 mV steps between 3.3 V and 21 V, with fine current control. This is critical for fast charging of phones, because modern battery management ICs want to drive the battery directly with a tightly controlled voltage that tracks the cell state. PPS lets the phone's BMIC pick the exact voltage it wants, moment by moment, without going through a DC-DC converter stage inside the phone. That is why USB PD fast charging uses less energy (less heat) than older schemes where the charger provided a fixed voltage and the phone converted it internally.
EPR (Extended Power Range), the 48 V mode in USB PD 3.1, lets a source push up to 240 watts through a USB-C cable. That level of power requires an EPR-capable cable with an E-Marker advertising EPR support, an EPR-capable source, and an EPR-capable sink. Otherwise the negotiation falls back to 100 watts maximum. EPR cables look identical to ordinary USB-C cables but are internally built for lower resistance and higher voltage rating, and they are noticeably more expensive.
Alternate Modes: The Connector Becomes Anything
The most remarkable feature of USB-C is that the connector can carry protocols that are not USB at all. This is done through alternate modes, negotiated over the CC pin and implemented by physically rerouting the high-speed lanes inside the host and device.
DisplayPort Alternate Mode is the most common. In DP alt mode, the two or four SuperSpeed lanes in the USB-C cable are reassigned to carry DisplayPort main link data instead of USB 3.2 SuperSpeed traffic. The AUX channel that DisplayPort needs for EDID and HPD signalling is carried on the SBU pins. The USB 2.0 pair continues to work as normal, so a USB-C to DisplayPort adapter still carries the low-speed USB 2.0 bus alongside DisplayPort. This is why a single cable can drive a monitor and charge the laptop and transfer USB 2.0 traffic simultaneously: different pins are carrying different signals, all through the same connector.
Thunderbolt 3 and Thunderbolt 4 work similarly. When Thunderbolt mode is negotiated, the high-speed lanes carry Thunderbolt's own transport protocol, which is a PCIe-over-switched-fabric design with tunnelled DisplayPort and USB 3.x inside it. A Thunderbolt 3 port can carry DisplayPort 1.4 and PCIe 3.0 x4 and USB 3.2 at the same time, all multiplexed over the same two lanes. The switching is done by a Thunderbolt controller chip (Intel's Titan Ridge, Maple Ridge, Goshen Ridge, or equivalent chips from the now-standalone group) that sits between the CPU's PCIe root complex and the USB-C port. On USB4, this same stack is folded into the USB4 protocol itself and can be implemented without a separate Thunderbolt chip.
The critical thing about alternate modes is that the USB host controller on the other end is not involved. Once DP alt mode is negotiated, the host's display controller is driving those wires directly, not the USB controller. From the USB stack's point of view, SuperSpeed has been turned off on that port. The USB 2.0 link is still active because it runs on separate pins, so a USB 2.0 keyboard on the same cable still works. But any bulk or isochronous transfer that would normally use SuperSpeed is gone until the alt mode is exited.
Billboard Class exists specifically to report alt mode support. A device that supports a particular alt mode exposes a Billboard interface during enumeration, telling the host which alt modes it can switch to and what mode it is currently in. If the user plugs a device that requires an alt mode the host does not support, the Billboard interface is the mechanism for showing the "This USB device is not supported" notification instead of silently failing.
USB4, the most recent addition, formalises alt mode into the protocol. A USB4 link is always capable of tunnelling USB 3.2, DisplayPort, and PCIe simultaneously, without needing a separate negotiation. The USB4 Router handles the multiplexing, and the two ends agree on how to allocate bandwidth across the tunnels. At USB4 Gen 3x2 (40 Gbps symmetric) you might run 4K60 DisplayPort for a monitor, a PCIe link to an external SSD, and USB 3.2 Gen 2 for a keyboard and mouse, all through a single cable. At USB4 Gen 4 (80 Gbps) the bandwidth per tunnel goes up accordingly, enough to drive an 8K monitor while an external GPU is talking PCIe.
The Host Controller: xHCI And What The Kernel Actually Talks To
On the host side, the operating system does not talk to the USB bus directly. It talks to a host controller, which is a PCI device that implements one of a handful of standard register interfaces. For USB 1.x and USB 2.0 there were OHCI, UHCI, and EHCI. For USB 3.0 and later there is xHCI, the eXtensible Host Controller Interface.
xHCI is the one that matters now. It unified the low-speed, full-speed, high-speed, and SuperSpeed support into a single register interface, so a single driver can handle every speed on every port. The xHCI spec is long and complex, but the core abstraction is the ring: a circular buffer of transfer descriptors in system memory that the driver fills with commands and the controller walks through asynchronously. There is a command ring for host-issued commands (enable slot, set address, configure endpoint), an event ring for controller-issued events (port status change, transfer complete, error), and per-endpoint transfer rings for actual data movement.
When an application calls read() on a USB mass storage block device, the Linux usb-storage driver translates the read into a SCSI command, wraps it in a Command Block Wrapper, and hands it to the usbcore layer. Usbcore constructs a URB (USB Request Block) containing the CBW, the data buffer, and the Command Status Wrapper, and submits it to the xHCI driver. The xHCI driver places Transfer Request Blocks (TRBs) into the appropriate transfer ring, rings the doorbell register for that endpoint, and waits. The controller walks the ring, schedules the transactions on the bus, moves data through its DMA engine into or out of the kernel's buffers, and writes completion events to the event ring. An interrupt fires, the driver processes the events, and the URB completes, which completes the SCSI command, which completes the block layer request, which wakes up the application.
The entire hot path from read() to "data in your buffer" is non-blocking on the CPU side. The controller runs independently, DMA is doing the heavy lifting, and the CPU only gets involved on enqueue and completion. This is how a single core can drive a USB 3.2 Gen 2x2 link at 20 Gbps without being pinned.
The xHCI model also makes it possible for the host to pause, resume, and reconfigure endpoints without rebooting the bus, which is critical for dynamic scenarios like switching a USB-C port into alt mode or hotplugging a hub. The controller exposes a slot structure per device, and the driver can configure, deconfigure, and reconfigure endpoints on that slot without disturbing anything else on the bus.
Hubs: Topology And The Seven Layers
USB devices do not connect directly to the host controller in most cases. They connect through hubs, which are USB devices that route traffic between an upstream port (to the host) and several downstream ports (to other devices). Hubs are allowed to nest up to five layers deep on USB 2.0, seven layers deep if you count the root hub (the port on the host controller itself) and the device's internal hub, giving the spec its "seven layers" limit.
A hub has its own control endpoint, its own descriptors, its own class (0x09), and a small set of class-specific requests: get port status, set port feature, clear port feature, reset port. The host drives hubs through these requests to manage downstream ports just as it manages the root hub ports. When a device plugs into a downstream hub port, the hub detects the connect event, the host sees the port status change, and the host issues a reset and enumeration on that port just as it would on the root hub.
Hubs matter in practice because many "USB-C docks" and most USB-C displays contain a hub internally. When you plug a laptop into a USB-C monitor with extra ports, the monitor exposes a USB hub that the laptop enumerates and uses to reach the downstream ports. The hub class is completely standard, so any OS can drive it without vendor drivers.
On SuperSpeed the hub situation is more complex. A USB 3 hub contains two logical hubs in one chip: a USB 2.0 hub (for the legacy pair) and a SuperSpeed hub (for the SuperSpeed lanes). They enumerate separately, and they route traffic independently. A device plugged into a USB 3 hub appears as both a USB 2.0 device (reachable through the 2.0 hub path) and a SuperSpeed device (reachable through the SuperSpeed hub path), and the class driver picks the right path.
Split Transactions: How High-Speed Hubs Tolerate Slow Devices
One of the uglier corners of USB 2.0 is how a high-speed hub handles a low-speed or full-speed device plugged into a downstream port. If the hub just forwarded the bus, the entire bus would have to run at the slower device's rate, which would starve any high-speed devices on other ports. The spec solves this with split transactions, and the mechanism is worth understanding because it explains a lot of odd behaviour on real hubs.
When a full-speed keyboard is plugged into a high-speed hub, the host cannot talk to it at 480 Mbps. But the link between the host and the hub is running at 480 Mbps. So the host sends a high-speed transaction to the hub itself, labelled as a "start-split" for a particular downstream port and endpoint. The hub buffers the request, translates it into a low-speed or full-speed transaction on the slower downstream port, collects the response, and then waits for the host to come back with a "complete-split" on the high-speed link to fetch the response. The slow transaction runs on the downstream bus in parallel with other high-speed traffic on the upstream bus, and neither one blocks the other.
This sounds clean, but the scheduling is fiddly. The host has to leave enough time between the start-split and complete-split for the slow downstream transaction to finish, and it has to schedule those splits around the microframe boundaries. If the scheduling is wrong, the split times out and the transfer is retried. Some older hubs do splits poorly, and plugging a USB-MIDI device into one of them causes dropped MIDI packets even though the audio class should guarantee isochronous timing. The underlying bug is almost always that the hub's split transaction scheduler is not keeping up. Linux's xHCI driver has a fair amount of code devoted to working around specific hub firmware quirks around split timing.
On SuperSpeed this goes away: USB 3 hubs route high-speed and SuperSpeed on physically separate lanes, and the two hubs inside a single USB 3 hub are independent, so a full-speed device on a downstream port runs on the USB 2.0 hub with no split translation needed for SuperSpeed traffic on the other hub.
Debugging USB: usbmon, Wireshark, And Snooping The Bus
Because USB is a host-controlled bus with a well-defined packet structure, it is possible to capture and decode every transaction on a bus in software. On Linux the kernel exposes this through usbmon, a small driver that taps into the usbcore layer and writes every URB (submission and completion) to a debugfs file. The usbmon output can be read raw, but the useful way is to pipe it through Wireshark, which has a full USB dissector and knows how to decode control transfers, mass storage CBW/CSW pairs, HID reports, CDC messages, and a long list of class-specific payloads.
A typical debugging session looks like this. You modprobe usbmon, check /sys/kernel/debug/usb/devices to find which bus number your device is on, start Wireshark with the USB bus interface selected, plug in the device, and watch the enumeration dance unfold packet by packet: Get Device Descriptor, reset, Set Address, Get Configuration Descriptor, Set Configuration, and then whatever class-specific traffic the driver starts up. If a device is behaving badly, you can often spot the problem in the capture: a stalled endpoint, a malformed descriptor, a missing LANGID string, a descriptor that claims a bMaxPacketSize the device cannot actually support. Wireshark decodes most of it in a human-readable form, which saves you from staring at raw hex.
For the physical layer there are dedicated hardware analysers from Total Phase (Beagle), LeCroy/Teledyne, and Ellisys. They sit inline on the cable and capture the actual wire signalling, which is the only way to debug problems below usbcore, like link training failures, electrical glitches, or timing violations on SuperSpeed. They are expensive (a SuperSpeed-capable analyser starts around 5,000 euros) but indispensable when you are bringing up a new USB device or chasing an intermittent cable issue.
USB Gadget Mode: Being The Device Instead Of The Host
Everything in the article so far has assumed you are the host. But many devices are on the other end of the cable, and the Linux kernel has a framework for implementing USB device-side behaviour called USB Gadget. A Raspberry Pi Zero or a Beaglebone Black can pretend to be a mass storage device, a serial port, an Ethernet adapter, a MIDI device, or a webcam, because the SoC has a USB controller that supports device mode and the kernel has drivers to drive it.
The gadget API exposes function drivers that implement each class. The ConfigFS-based g_multi lets you compose several functions into a single composite device: plug a Pi Zero into a laptop and it can appear simultaneously as a serial console (CDC-ACM), an Ethernet adapter (RNDIS or CDC-NCM), and a mass storage device pointing at a disk image file. The laptop sees one USB device with three interfaces and loads the matching class drivers for each.
This is the same framework that phone manufacturers use for Android's USB debug mode. When you enable ADB over USB, the phone switches its USB gadget configuration to one that exposes a single vendor-specific interface, the ADB daemon on the phone listens on it, and the adb client on the desktop talks to that interface via libusb. No special hardware is involved; ADB is literally just a USB gadget function plus a protocol running on top.
Security: BadUSB, Juice Jacking, And Data Role Swap
USB was designed in an era when "plug in a device" meant "trust it". The enumeration process gives the device enormous leeway to describe itself however it likes, and the host obediently loads the matching driver. This was fine when USB devices were keyboards and flash drives and everyone assumed the user knew what they were plugging in. It is less fine now.
BadUSB, the 2014 attack by Karsten Nohl and Jakob Lell, is the classic illustration. They showed that many USB flash drives contain a user-reflashable microcontroller that handles the USB side of the device. The firmware on that microcontroller decides which descriptors the drive presents, and you can replace the firmware to make the drive present a different class on every insert. Plug in what looks like a flash drive, and it enumerates as an HID keyboard, which the operating system trusts implicitly, which then types a command into a shell and installs a payload. Or it enumerates as an Ethernet adapter with a DHCP server and redirects your DNS. Or it enumerates as a trusted token that unlocks the keychain. The host has no reliable way to distinguish a BadUSB device from a real keyboard, because USB has no cryptographic identity for devices. The VID and PID can be forged, the serial number string can be forged, and the descriptors can be constructed to match any real device.
Partial mitigations exist. USB Authentication, an optional part of USB PD, lets the host verify a cryptographic signature on a device's certificate before allowing certain operations. It has seen almost no adoption because it adds cost and the ecosystem of certificates never materialised. Operating systems can ask the user to confirm unknown HID devices (GNOME and KDE both implement some form of this). Linux has usbguard, a userspace daemon that whitelists allowed devices by descriptor and blocks everything else until the user explicitly permits it. None of these are a full solution, and BadUSB remains a real risk at security-sensitive sites, where the usual mitigation is physically filling in USB ports with epoxy on machines that should not accept removable media.
Juice jacking is a related concern at public charging stations. A USB-A port in an airport or train station in Frankfurt or Amsterdam could, in principle, be wired to a hidden host that enumerates your phone as a device and attempts to pull data. In practice, modern phones (Android and iOS alike) require explicit user permission before allowing data transfer on a USB connection, and the default state is "charging only", which switches the phone's USB gadget configuration to one that exposes no data interfaces. The risk has been largely mitigated at the OS level, but the attack is still possible on older devices and on jailbroken phones.
Data role swap on USB-C is a subtler issue. Because USB-C allows the two ends to swap host and device roles with a DR_Swap message, a malicious cable or dock can request a role swap after the initial connection, suddenly putting itself on the host side. A phone that initially connected as a device ("I am a peripheral") can find itself acting as a host, with the "charger" on the other end now presenting keyboard descriptors. Modern phones will refuse DR_Swap requests unless the user has opted in, but again, the USB-C protocol permits it, and older firmware does not always handle it safely.
An End-To-End Trace: Plugging In A Stick
Putting all of the layers together, here is what actually happens when Katerina plugs a USB 3.2 flash drive into her Lisbon laptop:
- Mechanical connect. The USB-C plug mates with the socket. The CC pins see the source-sink resistor pattern and decide which side is which.
- CC negotiation. The laptop acts as the source for VBUS. It enables the 5 V rail on VBUS and starts listening for PD messages on the CC line. The drive is a simple bus-powered sink and sends a Source_Capabilities query. They settle on 5 V at 900 mA.
- Orientation detect. The laptop sees that the plug is inserted in orientation A and routes its SuperSpeed lanes to the correct pins on the port.
- SuperSpeed link training. The SerDes on both ends go through Polling and U0 link states, exchange LFPS (Low-Frequency Periodic Signalling) patterns, and align their lane encoding. After about 100 ms the SuperSpeed link is up.
- USB 2.0 attach. In parallel, the USB 2.0 pair pulls D+ high and the root hub sees a full-speed attach. The host issues a reset. The device responds as a USB 2.0 device, and the host reads its device descriptor.
- SuperSpeed enumeration. The host also reads descriptors over the SuperSpeed link. It sees that the device supports both USB 2.0 and SuperSpeed, with matching VIDs and serial numbers, and knows to treat them as one logical device.
- Address assignment and configuration. The host assigns an address, reads the full configuration descriptor, and picks the one configuration the device offers. The configuration contains one interface with class 0x08 (mass storage), subclass 0x06 (SCSI), protocol 0x50 (BOT, Bulk-Only Transport), and two bulk endpoints, one IN and one OUT.
- Class driver match. The kernel matches 0x08/0x06/0x50 against its driver table and loads
usb-storage. - SCSI probe.
usb-storageissues a SCSI INQUIRY, gets back a vendor string and product revision, issues a READ CAPACITY to learn the block count, and issues a TEST UNIT READY to confirm the device is responsive. - Block device creation. The SCSI layer creates
/dev/sdb, the block layer reads the first few sectors, partition scanning finds a GPT with an EXT4 partition, and udev writes out the corresponding device nodes. - Automount. The desktop environment's auto-mounter sees the new block device, calls
mount, and opens a file manager window on the new mount point.
All of that happens in well under two seconds on a fast laptop, and most of it is invisible to the user. The only thing they see is a window popping up. But every step in the list is required, and every step has a fallback path if something goes wrong. Skipping any one of them means the drive does not work.
Where USB Is Going
USB has been evolving on a roughly three-year cycle for a decade. Where the spec is likely to be interesting in the next few years:
USB4 v2 (80 Gbps symmetric, 120/40 Gbps asymmetric) is starting to ship on laptops and Thunderbolt 5 docks in 2024 and 2025. It is fast enough to run a single 8K display tunnelled alongside a 40 Gbps PCIe link. The signalling moves to PAM3 to squeeze more bits per symbol, and the cables need tighter tolerances and, for longer runs, active retimers.
USB PD 3.1 EPR has not been widely adopted yet. Almost no laptops ship with 48 V power profiles, and almost no chargers offer them, but the spec is in place and the hardware exists. Expect to see EPR on workstation-class laptops and portable gaming machines, where a single cable that delivers 200+ watts genuinely beats a barrel connector.
The question of whether USB4 kills Thunderbolt as a separate brand is mostly answered: Thunderbolt 4 and Thunderbolt 5 are now effectively certified profiles of USB4. Intel still licenses the Thunderbolt name and tests for certain minimum capability levels, but under the hood the stack is the same.
Meanwhile, the old USB 2.0 link is still on every single USB-C cable on earth, running at 480 Mbps, and it is almost certainly going to stay there for the foreseeable future. There are too many devices that rely on it, too many low-speed use cases where it is more than enough, and no compelling reason to cut it. The most complicated and expensive consumer connector in the world still carries a slow half-duplex differential pair from 2000 for backwards compatibility with a mouse you bought in 2002.
That is the thing about USB. It is not really one bus. It is four generations of bus standards running on the same wires, a device model and class system that ties them all together, a power negotiation protocol that grew from 2.5 watts to 240 watts, and a multiplexing layer that lets the same connector carry DisplayPort, PCIe, and Thunderbolt alongside real USB data. Every part of it is more complicated than the cable suggests, and every part is necessary to make "just plug it in" work across so many kinds of devices at once. Next time you plug a stick into a laptop, the half-second delay before the drive appears is almost an understatement for what is happening inside.