How BIOS, UEFI, and CSM Actually Work
Try the interactive lab for this articleTake the quiz (6 questions · ~5 min)The moment you press the power button on a laptop in Paris, an extraordinary sequence of events begins. Voltage rails stabilise in the right order. A small piece of firmware embedded in the chipset starts running from a reset vector baked into the CPU. Memory controllers initialise DRAM timings without having any RAM to use yet. Hundreds of hardware devices are probed, configured, and handed off. Tables are constructed in memory describing the machine's topology, power states, and interrupt routing. Only then, after all this invisible work, does a file called \EFI\Microsoft\Boot\bootmgfw.efi or \EFI\BOOT\BOOTX64.EFI get loaded and control finally leaves the firmware.
Most developers never think about any of this. The screen shows a vendor logo, then GRUB, then the login prompt. The entire firmware pipeline is hidden behind that logo, treated as "before the computer starts". In reality it is an operating system of its own, running code from NAND flash on the motherboard, with drivers, a shell, a file system abstraction, and a protocol-based plugin architecture. Understanding it makes a lot of otherwise mysterious problems (boot loops, dual-boot failures, Secure Boot errors, mysterious 10-second delays on cold boot) suddenly comprehensible.
This article walks through the whole story, from the reset vector to the operating system's entry point. It covers legacy BIOS for historical context, then spends most of its time on UEFI because that is what every modern PC, laptop, and server in the European market runs today. It also explains CSM, why it existed, and why it is being removed.
The Very First Instruction
When power is applied, the CPU does not know anything about where code lives. It has to start somewhere fixed. On an x86 CPU, that somewhere is the reset vector: physical address 0xFFFFFFF0, near the top of the 32-bit address space. That is 16 bytes below the 4 GiB mark. The CPU fetches its first instruction from that address.
But there is no RAM at 0xFFFFFFF0 when the machine starts. RAM has not been initialised yet. The address range near the top of memory is decoded by the chipset as "SPI flash", a small NOR or NAND chip soldered to the motherboard that holds the firmware image. The chipset's BIOS region mapping intercepts reads to the top 16 MiB (typical size) of the address space and redirects them to the flash chip via the SPI bus.
So the first instruction is fetched directly from flash, over SPI, at a few megabytes per second. It is typically a far jump to the real entry point of the firmware. On x86, the CPU starts in 16-bit real mode regardless of whether the firmware is legacy BIOS or modern UEFI. The very first code runs in the same execution mode the original 8086 used in 1978. The firmware's job, among other things, is to transition the CPU through 16-bit real mode, 32-bit protected mode, and eventually 64-bit long mode before handing control to the operating system.
On ARM platforms, the story is similar in structure but different in detail. The CPU starts at a SoC-specific reset address, runs mask ROM code baked into silicon, and loads a first-stage bootloader from a predetermined storage device (usually eMMC or SPI flash). The same phases repeat: pre-RAM, post-RAM, device initialisation, operating system handoff. The vocabulary differs but the concepts line up.
Legacy BIOS: A Brief History
The original IBM PC BIOS, written by Microsoft and IBM in 1981, was a piece of 8088 assembly that fit in 8 KiB of ROM. Its responsibilities were tiny: initialise the display card, detect how much memory was present, load the first sector of the floppy disk, and jump to it. Everything else, from keyboard input to file system access, was provided as a set of software interrupt handlers that DOS programs could call directly.
The classic BIOS boot sequence looked like this:
- POST (Power-On Self Test). The BIOS ran a short series of hardware checks: CPU registers, memory, keyboard controller, display. Failures caused diagnostic beep codes. The one-beep-two-beep language vendors used for debugging was real, and matters persisted well into the 2010s.
- Option ROM scanning. Add-in cards could provide their own BIOS code in small ROMs. The system BIOS looked for option ROM signatures (
0x55 0xAA) in expansion ROM address space and jumped into them, giving each card a chance to initialise itself. This was how video cards, network cards, and RAID controllers extended the BIOS. - Disk identification. The BIOS used INT 13h to read disk sectors. Each disk had a Master Boot Record at LBA 0: a 512-byte sector containing 446 bytes of bootloader code, 64 bytes of partition table (four 16-byte partition entries), and a 2-byte signature
0x55AA. - MBR execution. The BIOS loaded the MBR into memory at
0x7C00and jumped to it. The 446 bytes of code had to find the active partition, load its first sector (the volume boot record), and jump there. With 446 bytes, there was no room for file system parsing. The VBR had to continue the chain. - Bootloader handoff. A typical Windows or Linux system ran through multiple stages: MBR → VBR → first-stage bootloader (GRUB stage 1.5 or NTLDR) → second-stage bootloader → operating system kernel. Each stage had more code and more knowledge.
This worked for a long time. It also accumulated increasingly painful limitations.
The 2.2 TB disk limit. MBR partition tables used 32-bit LBA values, capping addressable disks at 2 TiB. By 2010, consumer drives were crossing that line and the industry had no graceful answer inside the MBR scheme.
The four primary partition limit. MBR had exactly four partition slots. Extended partitions added a linked list trick to get more, but it was awkward and error-prone.
The 16-bit real mode entry. Every BIOS-booted operating system had to start in real mode and transition itself. This was fine for MS-DOS but increasingly absurd for 64-bit Linux and Windows, which had to write elaborate trampolines to get from 16-bit to 64-bit mode.
The BIOS interrupt interface. INT 13h and its friends assumed a PC/AT hardware model that did not survive the move to PCI Express, multiple disks with mixed sector sizes, GPU-assisted displays, and modern input devices. The interrupt interface became a thin translation layer over what the actual hardware wanted to do.
The no-framebuffer-at-boot problem. Legacy BIOS had no standard way to set up a high-resolution framebuffer. Early boot messages had to use VGA text mode, a 25x80 grid of ASCII characters drawn by the graphics card from a ROM font. Modern systems with 4K displays and no VGA compatibility fell back to blank screens or ugly text.
By the mid-2000s, BIOS was a relic wrapped in compatibility hacks. Something had to replace it.
Enter UEFI
UEFI (Unified Extensible Firmware Interface) started life as EFI, invented by Intel in the late 1990s for the Itanium platform. Itanium did not have a legacy to preserve, so Intel designed a clean replacement for BIOS from scratch. When Itanium fizzled commercially, the EFI specification was transferred to the UEFI Forum (a consortium including Intel, AMD, Microsoft, and all the major motherboard vendors) and evolved into the spec we have today.
UEFI is not a single piece of code. It is a specification that describes:
- A boot flow divided into phases, each with well-defined responsibilities.
- A protocol model where firmware drivers publish interfaces that other code can consume.
- A platform-independent C API.
- A file system layer (FAT32) that firmware can read.
- A set of standard executable formats (PE/COFF for compiled modules, a shell for interactive use).
- A configuration database stored in NVRAM.
- A memory map exposed to the OS.
- A standard boot manager that understands what operating systems look like.
The reference implementation is called EDK II (EFI Development Kit 2), maintained by TianoCore. Most vendor firmware is built from EDK II with vendor-specific additions. You can download EDK II, build it yourself, and boot it in QEMU to see exactly what modern firmware does.
The Phases of UEFI Boot
UEFI defines five phases, each with a specific job. Everything that happens between power-on and operating system handoff fits into one of them.
SEC (Security phase). This is the very first code that runs. Its job is to establish a temporary execution environment before RAM is available. It uses the CPU's cache as RAM (a trick called cache-as-RAM, or CAR), setting up a small region of cache lines that behave like writable memory. SEC then hands off to PEI with a list of firmware modules to execute.
PEI (Pre-EFI Initialisation). PEI's job is to turn the hardware into a state where full DXE drivers can run. That means initialising DRAM. Memory training is one of the most complex parts of modern firmware: the memory controller runs a training algorithm to find the correct timing parameters for each DIMM, character by character of the DDR signalling pattern. On DDR5 systems, this can take several seconds and is why cold boots feel noticeably slower than warm boots (the results are cached in an MRC cache and reused if the memory configuration has not changed).
PEI modules are called PEIMs (PEI Modules). They communicate through a set of shared data structures called PPIs (PEIM-to-PEIM Interfaces). The PEI core discovers PEIMs in a firmware volume, dispatches them in dependency order, and tracks shared state in a HOB (Hand-Off Block) list.
DXE (Driver Execution Environment). Once RAM is up, DXE takes over. DXE is a full driver environment: it has a dispatcher, a service table, a protocol database, and a plugin architecture. DXE drivers install protocols that other drivers consume. A typical DXE phase enumerates PCI devices, loads drivers for each of them, initialises the chipset, sets up USB, and so on.
The DXE protocol model is the heart of UEFI. A protocol is a pair: a GUID identifying the interface, and a C struct of function pointers implementing it. For example, EFI_BLOCK_IO_PROTOCOL (GUID 964e5b21-6459-11d2-8e39-00a0c969723b) describes how to read and write sectors from a block device. Any driver that can read and write blocks installs this protocol on its device handle. Any consumer that wants to read blocks looks up the protocol by GUID and calls its ReadBlocks function. This is how file system drivers, disk encryption drivers, and boot manager code talk to storage without knowing anything about the underlying hardware.
Similar protocols exist for file systems (EFI_SIMPLE_FILE_SYSTEM_PROTOCOL), graphics (EFI_GRAPHICS_OUTPUT_PROTOCOL), network (EFI_SIMPLE_NETWORK_PROTOCOL), key input (EFI_SIMPLE_TEXT_INPUT_PROTOCOL), and dozens more. A complete DXE environment has hundreds of protocols installed on dozens of device handles.
BDS (Boot Device Selection). When DXE has finished initialising drivers, BDS takes over. BDS is the "boot manager" from the user's perspective: it reads the boot order from NVRAM, tries each boot option in turn, and launches the first one that works. A boot option is a Load Option record stored in an NVRAM variable (for example Boot0000, Boot0001, etc.) whose content is a Device Path: a structured description of where to find the bootable file.
If you have ever run efibootmgr -v on Linux, you have seen these records:
BootCurrent: 0001
Timeout: 0 seconds
BootOrder: 0001,0000,2001,2002,2003
Boot0000* Windows Boot Manager HD(1,GPT,...)/File(\EFI\Microsoft\Boot\bootmgfw.efi)
Boot0001* debian HD(1,GPT,...)/File(\EFI\debian\grubx64.efi)
Boot2001* EFI USB Device...Each entry is a device path plus optional metadata. BDS walks them in order, loads the referenced file, and transfers control to its entry point. That is the moment you leave firmware and enter the operating system.
TSL and RT (Transient System Load and Runtime). Once the OS loader starts, the system is in TSL. The OS loader can still call UEFI Boot Services (memory allocation, file I/O, protocol lookup) until it calls ExitBootServices. After that, only UEFI Runtime Services (a much smaller set covering things like variable access and reset) remain available. The kernel is now in charge.
What Actually Launches Your OS
A UEFI application is a PE32+ executable, the same format used by Windows .exe files but with a specific subsystem value (IMAGE_SUBSYSTEM_EFI_APPLICATION). The entry point is efi_main(ImageHandle, SystemTable). That entry point receives a handle to itself and a pointer to the UEFI System Table, which contains pointers to Boot Services, Runtime Services, and the list of installed protocols.
A minimal "hello world" UEFI application looks like this:
#include <efi.h>
#include <efilib.h>
EFI_STATUS efi_main(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable) {
InitializeLib(ImageHandle, SystemTable);
Print(L"Hello from UEFI, Barcelona!\n");
return EFI_SUCCESS;
}Compile that with GNU-EFI, place the resulting .efi file on a FAT32 partition, and boot from it. The firmware will load it, map it into memory, relocate it, and call efi_main. You can do this on a USB stick and boot any UEFI machine with it, no operating system required.
Real bootloaders do much more. GRUB's EFI build is a UEFI application that uses EFI_SIMPLE_FILE_SYSTEM_PROTOCOL to read its configuration, loads the Linux kernel, sets up boot parameters, and calls ExitBootServices before jumping to the kernel entry point. The Linux kernel itself can be built as a UEFI application directly ("EFI stub"), which means the kernel image is a valid PE32+ file that firmware can load without any intermediate bootloader. This is the basis of systemd-boot and of direct kernel boot on systems that do not use GRUB.
Protocol Lookup in Practice
A concrete example of how DXE protocols get used helps make the abstraction feel real. Say a file system driver wants to read a block device. In pseudocode it looks like this:
EFI_HANDLE *handles;
UINTN handleCount;
// Find every device with a block IO protocol installed
gBS->LocateHandleBuffer(
ByProtocol,
&gEfiBlockIoProtocolGuid,
NULL,
&handleCount,
&handles
);
for (UINTN i = 0; i < handleCount; i++) {
EFI_BLOCK_IO_PROTOCOL *bio;
gBS->HandleProtocol(
handles[i],
&gEfiBlockIoProtocolGuid,
(VOID **)&bio
);
if (bio->Media->LogicalPartition) {
// It's a partition, try to read LBA 0
UINT8 buffer[512];
bio->ReadBlocks(bio, bio->Media->MediaId, 0, 512, buffer);
// Check for GPT or filesystem signature...
}
}Every operation is indirected through the protocol database. The caller knows nothing about the specific storage driver. The storage driver knows nothing about the caller. The only shared knowledge is the GUID and the struct definition. This is what lets a single UEFI firmware image boot off SATA, NVMe, USB, iSCSI, and network block devices without any of the higher layers caring which is which.
GPT: The Partition Table UEFI Expects
GPT (GUID Partition Table) is the partitioning scheme used with UEFI. It replaces the MBR's four-partition, 2 TiB limit with a 128-entry table that supports disks up to 9.4 ZB (zettabytes). Its structure is worth knowing because it interacts with every UEFI boot flow.
A GPT-formatted disk has the following layout:
- LBA 0: a protective MBR. It contains a single partition entry covering the whole disk and claiming type
0xEE(GPT Protective). Legacy tools that do not understand GPT see a disk that looks full and refuse to modify it, protecting the GPT from damage. Tools that understand GPT see the protective marker and switch to GPT parsing. - LBA 1: the GPT header. This holds a signature (
EFI PART), revision, header size, CRC, a self-reference LBA, a backup header LBA, the start and end LBA of usable space, the disk GUID, the partition entry array LBA, the number of partition entries, the size of each entry, and the CRC of the partition entry array. - LBA 2 through 33: the partition entry array, 128 entries of 128 bytes each by default. Each entry contains a partition type GUID, a unique partition GUID, first LBA, last LBA, attribute flags, and a 72-byte UTF-16 name.
- End of disk: a backup GPT header and a backup partition array, mirroring the primary. If the primary is damaged, the backup can rebuild it.
The partition type GUID identifies what the partition contains. A few important GUIDs:
c12a7328-f81f-11d2-ba4b-00a0c93ec93b: EFI System Partition (ESP). Formatted as FAT32. Contains.efibootloaders.0fc63daf-8483-4772-8e79-3d69d8477de4: Linux filesystem data.e3c9e316-0b5c-4db8-817d-f92df00215ae: Microsoft Reserved.ebd0a0a2-b9e5-4433-87c0-68b6b72699c7: Microsoft basic data (NTFS, FAT).
The ESP is the key partition for boot. UEFI firmware knows how to mount FAT32, walk to \EFI\<vendor>\<name>.efi, and execute it. Every operating system that boots on UEFI puts its bootloader in the ESP. On a dual-boot system you might see directories like \EFI\Microsoft\Boot\, \EFI\debian\, \EFI\ubuntu\, and \EFI\BOOT\ (the fallback path). The boot order in NVRAM decides which one wins.
The ESP is usually the first partition on the disk, sized between 100 MiB and 1 GiB. Windows Setup creates a 100 MiB ESP by default. Most Linux installers pick 512 MiB. Some distros in Europe (openSUSE, Fedora) suggest 1 GiB to make room for kernel and initrd files.
Reading a GPT with dd
You can inspect the GPT of a disk directly. On a Linux machine with a GPT-formatted disk at /dev/sda:
$ sudo dd if=/dev/sda bs=512 skip=1 count=1 2>/dev/null | xxd | head
00000000: 4546 4920 5041 5254 0000 0100 5c00 0000 EFI PART....\...
00000010: 3b3d 2c21 0000 0000 0100 0000 0000 0000 ;=,!............
00000020: af0e 7407 0000 0000 2200 0000 0000 0000 ..t....."......
00000030: ce0e 7407 0000 0000 f54a 4d22 3fb9 4a5b ..t......JM"?.J[
...The first eight bytes are EFI PART, the GPT signature. The rest of the header follows the layout described in the UEFI spec. Tools like gdisk, parted, and sgdisk parse this for you, but occasionally doing it by hand is instructive. The backup GPT at the end of the disk is byte-for-byte identical to the primary, minus the self-references which are swapped.
gdisk -l /dev/sda gives you a nicer view:
Disk /dev/sda: 1953525168 sectors, 931.5 GiB
Sector size (logical/physical): 512/4096 bytes
Disk identifier (GUID): 214D4AF5-B93F-5B4A-8D4F-...
First usable sector is 34, last usable sector is 1953525134
Number Start (sector) End (sector) Size Code Name
1 2048 1050623 512.0 MiB EF00 EFI System
2 1050624 1953523711 931.0 GiB 8300 Linux filesystemThis is the information UEFI firmware reads to figure out which partition is the ESP, and it is the same information tools like efibootmgr use when constructing device paths for new boot entries.
NVRAM and UEFI Variables
UEFI stores configuration in NVRAM, a small area of flash (typically 128 KiB to 512 KiB) dedicated to variable storage. Variables have a name, a vendor GUID (to avoid collisions), and attribute flags that control their persistence and access. The most important attribute flags are:
EFI_VARIABLE_NON_VOLATILE: the variable persists across reboots. Without this, it only lives for the current boot.EFI_VARIABLE_BOOTSERVICE_ACCESS: the variable is accessible to boot services code.EFI_VARIABLE_RUNTIME_ACCESS: the variable is accessible afterExitBootServices, when the OS is running.EFI_VARIABLE_AUTHENTICATED_WRITE_ACCESS: writes require a signature from an approved key. This is how Secure Boot's variable protection works.
Linux exposes UEFI variables through /sys/firmware/efi/efivars/. Each file represents a variable. The filename is <Name>-<GUID>. Reading the file returns the variable's content prefixed by a 4-byte attribute mask.
$ ls /sys/firmware/efi/efivars/ | head
Boot0000-8be4df61-93ca-11d2-aa0d-00e098032b8c
Boot0001-8be4df61-93ca-11d2-aa0d-00e098032b8c
BootOrder-8be4df61-93ca-11d2-aa0d-00e098032b8c
Lang-8be4df61-93ca-11d2-aa0d-00e098032b8c
PK-8be4df61-93ca-11d2-aa0d-00e098032b8c
...Boot variables (Boot0000, Boot0001, etc.) are 16-bit values referenced from BootOrder. BootOrder itself is an array of 16-bit values in priority order. Changing the boot order is as simple as writing a new BootOrder variable, which is what efibootmgr -o does.
Be careful with NVRAM writes. A bug in firmware or a poorly-tested kernel once caused an infamous bricking issue in certain Samsung laptops around 2013, where writing too many variables filled up the NVRAM and the firmware refused to boot. Modern kernels gate NVRAM writes behind sanity checks, but the memory is still small and should be treated as precious.
Modifying Boot Entries from Linux
efibootmgr is the standard tool on Linux for manipulating UEFI boot variables. A few commonly useful operations:
# List current boot entries
$ efibootmgr -v
# Change boot order
$ sudo efibootmgr -o 0001,0000,2001
# Create a new entry pointing at a specific .efi file
$ sudo efibootmgr --create \
--disk /dev/nvme0n1 --part 1 \
--label "Linux 6.8" \
--loader '\EFI\debian\vmlinuz-6.8.0'
# Remove an entry
$ sudo efibootmgr --bootnum 0003 --delete-bootnum
# Set next-boot-only override (boots this once, then reverts to normal order)
$ sudo efibootmgr --bootnext 0001The --bootnext flag is especially useful for testing: you can queue up a dangerous boot configuration without making it the permanent default, and if it fails, the next reboot falls back to the normal order.
All of these commands translate directly into reads and writes of UEFI variables via efivarfs. You can see the same effect by writing files directly in /sys/firmware/efi/efivars/, though efibootmgr handles the attribute prefix and validation for you.
CSM: The Compatibility Layer
Early UEFI systems still had to boot legacy operating systems. Windows XP did not understand UEFI. Neither did many older Linux distros. The industry's answer was CSM (Compatibility Support Module), an optional piece of firmware that emulated a traditional BIOS environment inside a UEFI system.
When CSM was enabled, the firmware would publish legacy BIOS interrupt handlers (INT 10h, INT 13h, INT 16h, etc.), make an MBR-compatible disk image visible at LBA 0 (by rewriting the protective MBR on GPT disks or using the actual MBR on MBR-formatted ones), and run video cards in VGA text mode. The CSM module translated BIOS calls into UEFI protocol invocations under the hood. For all practical purposes, a CSM-enabled UEFI system looked like a legacy BIOS to any operating system.
CSM was a lifeline for a decade. It let users upgrade their motherboard or BIOS without reinstalling their operating system. It let them dual-boot old and new OSes. It let them run hardware diagnostics that assumed a legacy environment. It was also a significant maintenance burden for firmware vendors, a security liability (legacy boot bypassed Secure Boot), and a persistent source of subtle bugs where the UEFI and legacy worldviews disagreed about hardware state.
Starting with Intel's Tiger Lake platform in 2020, and accelerating with 12th-gen Alder Lake in 2021, Intel announced the end of CSM. The firmware on modern Intel platforms either has no CSM at all, or has it disabled by default and limited to a narrow subset of hardware. AMD platforms have followed. Microsoft's Windows 11 requires UEFI-native boot (no CSM) and Secure Boot, making CSM effectively dead on consumer hardware by 2024.
For most users this is invisible. For anyone trying to boot a 10-year-old operating system image, or using an older PCIe card that only has a legacy option ROM, the end of CSM means buying newer hardware or finding a UEFI-native replacement.
ACPI: Describing the Machine to the OS
One of the most important things firmware does, which is not strictly about booting but which is enabled by UEFI, is build the ACPI tables. ACPI (Advanced Configuration and Power Interface) is a specification describing how operating systems discover hardware, configure power states, and handle hardware events. It replaced the Plug and Play BIOS and APM (Advanced Power Management) interfaces of the 1990s.
ACPI is table-driven. Firmware constructs a set of data tables in memory and registers their location with the OS through UEFI or a well-known memory region. The key tables:
- RSDP (Root System Description Pointer): a small structure containing pointers to the RSDT or XSDT. Located through the UEFI Configuration Table.
- RSDT / XSDT: root table listing pointers to all other ACPI tables.
- FADT (Fixed ACPI Description Table): fixed hardware features, power management registers, and pointers to DSDT and FACS.
- DSDT (Differentiated System Description Table): the main AML bytecode describing devices, their resources, and their power states. This is where most of the platform description lives.
- SSDT (Secondary System Description Table): additional AML bytecode, often dynamically loaded for CPU topology or specific devices.
- MADT (Multiple APIC Description Table): CPU count, APIC IDs, interrupt routing.
- MCFG: PCI Express configuration space base address.
- HPET: High Precision Event Timer location.
- SRAT / SLIT: NUMA topology.
AML (ACPI Machine Language) is a small bytecode interpreted by the kernel. A typical DSDT contains thousands of lines of AML describing the motherboard's devices: which bus each sits on, what IRQ it uses, what registers control its power state, what control methods to call on sleep, wake, and lid close. The kernel runs an AML interpreter to execute these methods. This is how Linux knows that your laptop's lid is closed, or how Windows knows how to put your NVMe drive into a deep sleep state.
If you have ever wondered why dmesg has thousands of ACPI-related lines at boot, this is the reason. The kernel is walking through thousands of AML operations to enumerate and configure every device on the board.
Firmware builds these tables at DXE time, using information collected from hardware probes and static vendor data. A bug in ACPI construction (wrong NUMA distances, missing devices, incorrect IRQ routing) causes operating systems to see the wrong picture of the hardware and behave badly. Almost every "X does not work on Y Linux version" laptop forum post is, at some level, an ACPI problem.
A Worked DSDT Example
To make AML less abstract, here is what a small DSDT excerpt looks like, disassembled from bytecode back to human-readable ASL:
Scope (_SB)
{
Device (LID0)
{
Name (_HID, EisaId ("PNP0C0D"))
Method (_LID, 0, NotSerialized)
{
Return (LEqual (LIDS, One))
}
}
Device (BAT0)
{
Name (_HID, EisaId ("PNP0C0A"))
Name (_UID, 0x00)
Method (_STA, 0, NotSerialized)
{
If (LAnd (BATP, One))
{
Return (0x1F)
}
Return (0x0F)
}
Method (_BIF, 0, NotSerialized)
{
...
}
}
}When the kernel boots, its ACPI subsystem loads the DSDT, parses it into an internal tree, and looks for devices matching the driver classes it has loaded. For each device, it calls _STA to check presence, _BIF for battery information, _LID for lid status, and dozens of other methods. Every single laptop feature that crosses the firmware/kernel boundary goes through this interface.
You can dump your own DSDT on Linux:
$ sudo cp /sys/firmware/acpi/tables/DSDT /tmp/dsdt.aml
$ iasl -d /tmp/dsdt.aml
$ less /tmp/dsdt.dslThe disassembled file can be tens of thousands of lines on a modern laptop. Reading it is a strange experience: you are looking at the handshake between the OEM's firmware engineers and your operating system, written in a bytecode designed in the late 1990s, still in active use today.
Secure Boot
Secure Boot is the UEFI mechanism for verifying the authenticity of every executable loaded before the operating system takes over. It is built on a hierarchy of cryptographic keys stored in NVRAM.
- PK (Platform Key): a single key controlled by the platform owner. Whoever holds the PK can authorise updates to the other Secure Boot variables. Typically set by the OEM at manufacturing time.
- KEK (Key Exchange Key): a set of keys authorised to update
dbanddbx. Typically includes the Microsoft KEK and the OEM's own KEK. - db (Signature Database): allowed signatures and hashes. Any executable signed by a key in
db, or whose hash is explicitly indb, will load. - dbx (Forbidden Signatures Database): revoked signatures and hashes. Binaries matching
dbxentries are refused even if they are also signed by something indb.
When UEFI loads a PE32+ executable, it checks the Authenticode signature against db and dbx before handing control over. A signature that matches dbx is immediately rejected. A signature that matches db and is not in dbx is accepted. Unsigned binaries, or those signed by unknown keys, fail the check unless the user has explicitly disabled Secure Boot in firmware setup.
Most Linux distros use the Shim loader as their Secure Boot entry point. Shim is a small bootloader signed by Microsoft (because Microsoft's key is in virtually every OEM's db). Shim contains an embedded distribution-specific key (the Machine Owner Key, or MOK) and chain-loads GRUB signed by that key. Shim also provides a MOK manager where users can enrol their own keys for custom kernels.
The full trust chain looks like this:
Firmware (signed binaries in db)
↓
Shim (signed by Microsoft → db)
↓ verifies
GRUB (signed by distro → embedded in Shim)
↓ verifies
Linux kernel (signed by distro key)
↓ verifies
Kernel modules (if module signing is on)Breaking any link in the chain results in a refused boot. Changing the kernel without resigning it breaks the chain. So does installing a custom GRUB build. This is why every time you compile a custom Linux kernel on a Secure Boot system in Zurich, you have to deal with MOK enrolment or disable Secure Boot altogether.
The biggest Secure Boot failure in recent history was the BootHole vulnerability in GRUB2, disclosed in 2020. A buffer overflow in GRUB's configuration file parser allowed arbitrary code execution before the kernel loaded, completely bypassing Secure Boot. The fix required revoking every affected GRUB signature across every distro, adding them to dbx, and shipping new signed binaries. Because dbx is finite and many old systems have no way to update it, some machines still carry the vulnerability years later.
Enrolling Custom Keys
If you are building your own bootloader or kernel, you probably want to sign it with your own key rather than ship something signed by Microsoft. Enrolling a custom key into Secure Boot's db works like this:
# Generate a key pair
$ openssl req -new -x509 -newkey rsa:2048 -keyout MOK.key \
-outform DER -out MOK.der -nodes -days 3650 \
-subj "/CN=Development Key/"
# Sign a binary with it
$ sbsign --key MOK.key --cert MOK.crt \
--output /boot/vmlinuz-6.8-custom.signed /boot/vmlinuz-6.8-custom
# Enrol the public key through shim
$ sudo mokutil --import MOK.dermokutil --import stages the enrolment. On the next reboot, Shim presents a MOK management menu (MokManager) that asks you to confirm the enrolment with a password. After confirmation, the key is stored in a shim-specific variable and used to verify subsequent boots. This sidesteps Microsoft entirely: your bootloader is verified against a key you control, and the whole trust chain flows from Shim (signed by Microsoft) through MOK (signed by you) to the kernel (signed by you).
TPM and Measured Boot
Secure Boot verifies signatures. Measured Boot records what was loaded. The two are complementary.
A TPM (Trusted Platform Module) is a small cryptographic chip attached to the motherboard that provides key storage, random number generation, and a set of Platform Configuration Registers (PCRs). PCRs are not normal registers: they can only be "extended", not written. Extending a PCR hashes its current value with a new value and stores the result. This makes PCR contents a running hash of every measurement pushed into them.
During boot, firmware extends PCRs with hashes of every module it loads: the UEFI firmware itself, the option ROMs, the boot manager, the kernel, the initrd. The result is a set of PCR values that uniquely identify the boot sequence. If anything changed, the PCR values would be different.
The value of this is remote attestation. A system running BitLocker or LUKS with TPM-based key release will not unseal the encryption key unless the PCR values match the expected set. If an attacker modifies the bootloader (for example, to capture the user's disk encryption password), the PCR values change, the TPM refuses to release the key, and the disk stays locked. The attacker can still modify the bootloader, but doing so breaks the system's ability to unlock its own storage.
This is why Windows 11 requires TPM 2.0: BitLocker and Windows Hello depend on measured boot for their security guarantees. It is also why Linux full-disk encryption setups in enterprise environments increasingly use systemd-cryptenroll to tie LUKS unlock to TPM PCRs.
SMM: The Hidden Kernel
One of the more unsettling facts about modern PC firmware is that it does not fully go away when the operating system takes over. System Management Mode (SMM) is an x86 CPU mode with its own memory region, its own interrupt handler, and its own code. It runs at a privilege level above the OS kernel: when a System Management Interrupt (SMI) fires, the CPU saves its current state, switches to SMM, runs the firmware's SMI handler from a region called SMRAM, and returns. The OS has no visibility into what happened.
SMM is used for a grab-bag of platform services: thermal management, fan control, firmware update handling, error reporting, some ACPI method invocations that need low-level hardware access. On a laptop, SMM code is what actually controls the fan when the CPU gets hot, even if the OS is running. SMM code is firmware, written by the OEM, signed by the firmware vendor, and completely inaccessible to the running kernel.
Security researchers have found SMM vulnerabilities that allowed privilege escalation across OS boundaries, because code in SMM has ring -2 level access (below the hypervisor). Firmware patches for SMM issues are shipped through UEFI capsule updates, the standard way to update firmware from a running OS. Ignoring firmware updates means ignoring patches for this invisible layer.
UEFI Shell and Practical Exploration
UEFI includes an optional shell, an interactive command-line environment that runs inside the firmware. Most vendor BIOS images ship it as a fallback boot option. On systems where it is not included, you can add your own by dropping Shell.efi onto an ESP and creating a boot entry for it.
The shell has commands that would look familiar to a Unix user:
Shell> map -r
Mapping table
FS0: Alias(s):HD0a65535a1:;BLK1:
PciRoot(0x0)/Pci(0x17,0x0)/Sata(0x0,0xFFFF,0x0)/HD(1,GPT,...)
FS1: Alias(s):HD0a65535a2:;BLK2:
PciRoot(0x0)/Pci(0x17,0x0)/Sata(0x0,0xFFFF,0x0)/HD(2,GPT,...)
BLK0: Alias(s):
PciRoot(0x0)/Pci(0x17,0x0)/Sata(0x0,0xFFFF,0x0)
Shell> fs0:
FS0:\> ls EFI
...
FS0:\> cd EFI\debian
FS0:\EFI\debian> ls
grubx64.efi shimx64.efi BOOTX64.CSV
FS0:\EFI\debian> shimx64.efiRunning shimx64.efi from the shell directly launches the bootloader, with the shell as its parent. This is extremely useful for debugging: you can see exactly what file is loaded, what it prints, and what error it returns.
Other useful commands:
devtree: prints the device handle hierarchy and every protocol installed on each handle.drivers: lists loaded UEFI drivers.bcfg: the shell's built-in equivalent ofefibootmgr. Can list, create, and modify boot entries.dmpstore: dump NVRAM variables.pci: enumerate PCI devices.smbiosview: dump SMBIOS tables in human-readable form.
If you want to explore the UEFI environment without risking an existing install, the easiest thing is to run OVMF (the UEFI firmware build from EDK II) inside QEMU:
$ qemu-system-x86_64 \
-drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE.fd \
-drive if=pflash,format=raw,file=/tmp/OVMF_VARS.fd \
-drive file=fat:rw:/tmp/espdir,format=raw \
-net noneDrop Shell.efi into /tmp/espdir/EFI/BOOT/BOOTX64.EFI, boot, and you have an interactive UEFI environment to play with. No hardware risk, fast iteration, full introspection.
The Boot Process End to End
Let me tie all of this together with a concrete sequence. You press the power button on a laptop in Vienna running Debian with UEFI, Secure Boot enabled, TPM 2.0 present, and LUKS with TPM-based key release.
- Power rails stabilise. The PCH (Platform Controller Hub) releases the CPU from reset.
- CPU fetches its first instruction at
0xFFFFFFF0from SPI flash. - SEC phase runs. Cache-as-RAM is set up. Early crypto is initialised.
- PEI phase runs. Memory is trained. DRAM becomes available. PEIMs publish their HOBs.
- DXE phase runs. PCI devices are enumerated. Storage drivers load. USB stack comes up. Graphics output protocol is installed and a framebuffer becomes available. The firmware draws its splash.
- ACPI tables are constructed. SMBIOS tables are populated.
- BDS phase runs. The firmware reads
BootOrderfrom NVRAM, picks the first entry (Boot0001→ debian). - The firmware opens
\EFI\debian\shimx64.efion the ESP. Shim's signature is checked againstdb(Microsoft KEK → OEM → db). It is valid. - Shim loads. It measures itself into PCR 4. It opens GRUB (
\EFI\debian\grubx64.efi), verifies against its embedded MOK, and transfers control. - GRUB measures itself. It reads its config from
/boot/grub/grub.cfg. It presents the menu (or auto-selects). - GRUB loads the Linux kernel from
/boot/vmlinuz-6.8.0-amd64and the initrd from/boot/initrd.img-6.8.0-amd64. It measures both into PCR 9. - GRUB calls
ExitBootServicesand jumps to the kernel's EFI stub entry point. - The Linux kernel starts. It parses the memory map, sets up its own page tables, switches to 64-bit long mode, and begins executing the normal init sequence.
- The kernel asks the TPM to unseal the LUKS master key. The TPM checks current PCR values against the policy. They match. The key is released.
- The kernel mounts the root filesystem, runs
/sbin/init, and the system is up.
From power-on to GRUB menu: typically 3 to 7 seconds on modern consumer hardware, dominated by PEI memory training and DXE device enumeration. From GRUB to login prompt: another few seconds, dominated by kernel initialisation and userland services.
The Option ROM Question
Add-in cards with their own firmware still exist, but their role has changed in the UEFI era. A legacy option ROM is 16-bit real-mode x86 code, loaded during POST, given a chance to initialise the card. Video cards historically used this heavily: every video card shipped with a VGA BIOS in a ROM, loaded into the 0xC0000 to 0xC7FFF region of memory, providing INT 10h-compatible text and graphics services.
UEFI does not run 16-bit code. It expects option ROMs in a different format called UEFI Driver, which is a PE32+ executable with a special subsystem value (IMAGE_SUBSYSTEM_EFI_BOOT_SERVICE_DRIVER). A UEFI driver installs protocols on its device handle in the same way as a DXE driver from the firmware itself. A GPU with a UEFI driver installs EFI_GRAPHICS_OUTPUT_PROTOCOL, giving the firmware (and any OS loader) a framebuffer without any of the legacy VGA dance.
For several years, both formats lived side by side. Most GPUs since around 2013 ship with both a legacy VBIOS and a GOP (Graphics Output Protocol) driver in the same flash. CSM-enabled systems used the VBIOS; CSM-disabled systems used the GOP driver. As CSM disappears, the VBIOS path is disappearing too, and new GPUs ship GOP-only.
PCIe network cards with PXE boot support work the same way. Legacy PXE option ROMs provided BIOS interrupt-based network access; UEFI uses EFI_SIMPLE_NETWORK_PROTOCOL and EFI_PXE_BASE_CODE_PROTOCOL. A modern server in Frankfurt that PXE-boots a Linux installer is using pure UEFI network code, no legacy involved.
The main pain point today is old hardware on new systems. A ten-year-old RAID card with a legacy-only option ROM simply will not work on a CSM-free UEFI system. The firmware has no way to load its 16-bit code. Upgrading requires either new hardware or finding a UEFI driver update from the vendor, which often does not exist.
Debugging Firmware Issues
When things go wrong in firmware, the symptoms are often cryptic. Here are a few things that regularly cause confusion.
Boot loops without reaching the OS. Usually a corrupted NVRAM variable or a failed Secure Boot check. Booting with Secure Boot temporarily disabled, or resetting BIOS to defaults, usually helps isolate the cause.
Device not detected after DXE. A driver failed to initialise. Check firmware logs (if exposed through efibootmgr -L, or through the BIOS setup itself). On server hardware, the BMC usually captures firmware logs.
Wrong OS loads after Windows update. Windows has a habit of rewriting BootOrder to put itself first. efibootmgr -o restores the Linux entry.
Slow cold boots, fast warm boots. Memory training is running on cold boot and its cache is being rebuilt. The MRC (Memory Reference Code) cache persists training data across warm reboots to skip this step, but cold boots (and BIOS updates) invalidate the cache.
Wrong time after boot. UEFI stores the hardware clock as either local time or UTC. Windows treats it as local time by default. Linux treats it as UTC by default. Dual-boot systems clash. The fix is usually telling Windows to use UTC via a registry edit.
ACPI errors in dmesg. Almost always a firmware bug. Vendor's DSDT has a mistake. Linux kernel workarounds exist for many specific models, but new laptops often ship with broken ACPI that takes months to get fixed.
Firmware Updates: The Capsule Mechanism
Updating firmware used to mean booting from a DOS floppy with a vendor flasher. UEFI defines a standard mechanism called capsule updates that lets the running OS deliver a firmware image to the firmware for installation at the next reboot.
The flow is:
- The OS downloads a firmware capsule file, typically from the vendor or through LVFS (Linux Vendor Firmware Service, an open project that aggregates firmware updates from dozens of vendors and pushes them through
fwupd). - The OS calls
UpdateCapsuleon the UEFI Runtime Services table, passing a pointer to the capsule. - Firmware validates the capsule signature against keys stored in NVRAM. The capsule must be signed by the vendor's firmware update key.
- Firmware writes the capsule to a staging area in flash or a reserved memory region and requests a reboot.
- On the next boot, the firmware picks up the staged capsule, applies it, and resumes normal boot.
On Linux, this is all wrapped in fwupd:
$ fwupdmgr refresh
$ fwupdmgr get-updates
$ fwupdmgr updateLVFS has been a quiet revolution. Before it existed, firmware updates on Linux were rare and dangerous, shipped as Windows-only executables that users could not run without a dedicated Windows install. LVFS turned firmware into just another thing that gets updated by the package manager. The result is that Linux laptops in Europe now receive firmware updates on roughly the same cadence as their Windows counterparts, and often faster for vendors that actively support LVFS.
The capsule format itself is defined in the UEFI specification. A capsule header carries a GUID identifying its type, a length, flags, and a payload. The payload is typically a vendor-specific binary blob containing the actual firmware image. The signature verification layer is handled by Authenticode-style PKCS#7 signing using keys provisioned at manufacturing time.
A Short History of What Changed and When
It is worth pinning down the timeline, because many developers still think of UEFI as "the new thing".
- 2005: Apple ships the first Intel Macs with EFI 1.10. First consumer UEFI systems.
- 2007: UEFI 2.1 released. GUID-based protocol model matures.
- 2011: Windows 8 released with UEFI support and Secure Boot requirement for Windows logo certification.
- 2012: Linux distros implement Shim to work with Secure Boot.
- 2017: Intel announces plan to deprecate CSM.
- 2020: Tiger Lake removes most of CSM. BootHole disclosed.
- 2021: Windows 11 released, requires TPM 2.0 and UEFI-native boot.
- 2023: Most new consumer PCs ship without CSM by default.
- 2026: CSM is effectively dead on new hardware. BIOS, in its original sense, is a museum piece.
Every modern platform you touch (Intel, AMD, Apple Silicon, ARM servers, Raspberry Pi's newer boards) uses UEFI or a UEFI-like firmware. Understanding the phases, protocols, and boot flow is no longer optional for anyone who works on operating systems, bootloaders, or low-level debugging.
What Really Changed
If I had to summarise the shift from BIOS to UEFI in a few sentences, it would be this. BIOS was a set of assembly-language interrupt handlers for a 1981 hardware model, wrapped in a 40-year-long stack of compatibility bandages. It left the CPU in 16-bit mode and left every operating system to dig itself out. It had four partition slots, no signature verification, no file system abstraction, no graphics beyond VGA text, and no standard way to update itself.
UEFI is a small operating system of its own, written in C, running in 64-bit mode, with drivers, protocols, file systems, a shell, NVRAM storage, a boot manager, a cryptographic trust chain, and a capsule-based update mechanism. It hands the operating system a running framebuffer, a complete ACPI description of the hardware, and a 64-bit environment. It is not perfect, and the amount of code running before your kernel has grown from kilobytes to megabytes, but the resulting environment is dramatically more capable and dramatically more consistent across vendors.
The lab that accompanies this article walks through the phases interactively, letting you step from power-on through OS handoff and watch the state of the system evolve at each stage. Toggling between legacy BIOS and UEFI paths makes the contrast concrete: you can see exactly which structures exist in one world but not the other, and why CSM had to exist to bridge them for so long.