← Back to Logs

How PKI Actually Works

Try the interactive lab for this articleTake the quiz (6 questions · ~5 min)

Most engineers meet PKI as an annoyance. A certificate expired. A staging hostname does not validate. curl says unable to get local issuer certificate. A mobile app refuses a perfectly "working" TLS endpoint because the pinned key changed. The system is usually introduced as a bag of operational rituals rather than as a coherent design.

That is backwards. Public key infrastructure exists to solve one specific problem: how does a client that has never seen your server before decide whether the public key it just received really belongs to your service and not to an attacker in the middle? Everything else in PKI, from root stores and intermediates to OCSP stapling and Certificate Transparency, exists to answer that question at internet scale.

This article walks through the system from the bottom up. We will cover the root CA hierarchy, how a certificate chain is actually built and validated, which X.509 extensions matter in practice, why revocation is notoriously awkward, how OCSP stapling and CT logs reduce some of that pain, and what certificate pinning really pins when it is done correctly. It connects directly to the HTTPS and TLS deep-dives because TLS uses PKI for authentication, but the infrastructure is broader than the handshake itself.

PKI Exists Because a Public Key by Itself Has No Identity

Public-key cryptography gives you a key pair:

  • a private key that stays secret
  • a public key that can be shared freely

That is enough to encrypt to the owner or verify signatures from the owner. It is not enough to know who the owner is.

Suppose your browser connects to https://bank.example and the server presents a 2048-bit RSA key or an ECDSA P-256 key. Cryptographically, the key may be perfectly fine. But a raw public key does not say whether it belongs to:

  • the real bank
  • a reverse proxy you control
  • a coffee-shop attacker running a transparent MITM box
  • a malware implant that poisoned DNS and redirected your route

Identity has to be bound to the key somehow. PKI does this with certificates: signed data structures that say, in effect, "this public key is valid for this subject under these conditions until this date, and I, the issuer, vouch for that statement."

The signature is the important part. If a trusted issuer signs the certificate, and the client already trusts that issuer or a chain leading back to it, then the client can accept the binding between identity and public key.

Without that signature chain, TLS would still be able to encrypt bytes, but it would not know whether it had encrypted them to the right server.

A Certificate Is a Signed Claims Document, Not Just a Key Container

In the web PKI, certificates are almost always X.509 v3 certificates encoded in ASN.1 DER. PEM is just base64-wrapped DER with header lines.

At a high level, a certificate contains:

  • the subject public key
  • the issuer name
  • the subject name
  • a serial number
  • validity dates
  • extension fields that define how the certificate may be used
  • the issuer's digital signature over all of the above

When you inspect a server certificate with OpenSSL:

openssl x509 -in cert.pem -text -noout

you are looking at a parsed view of a signed binary structure. The certificate is not trusted because it contains a hostname. It is trusted only if:

  1. the signature verifies under an issuer the client trusts
  2. the chain terminates in a trust anchor in the local trust store
  3. the certificate is valid for the intended use, hostname, and time
  4. revocation and policy checks do not reject it

That last list matters because many certificate bugs are not cryptographic failures. The signature verifies fine. The certificate is rejected because the SAN does not match, the EKU is wrong, the chain is incomplete, the intermediate is unauthorized to issue leaves, or the certificate is expired.

DER, PEM, and ASN.1 Matter Because Validators Consume Structure, Not Pretty Text

Most engineers first see certificates in PEM form:

-----BEGIN CERTIFICATE-----
MIIF...
-----END CERTIFICATE-----

That format is convenient for transport in config files, web servers, and CLI tooling. It is not the native logical form of the certificate. The actual certificate is a DER encoded ASN.1 structure. PEM is just a textual wrapper around those DER bytes.

This matters because validators do not check a nice human-readable field list. They parse a binary grammar with explicit tags, lengths, and nested structures. When OpenSSL prints:

openssl x509 -in cert.pem -text -noout

it is decoding ASN.1 objects such as:

  • tbsCertificate, the "to be signed" body
  • signatureAlgorithm
  • signatureValue

Inside tbsCertificate you then have nested fields for:

  • version
  • serial number
  • issuer
  • validity
  • subject
  • subjectPublicKeyInfo
  • extensions

The phrase subjectPublicKeyInfo, or SPKI, matters especially because that is the structure pinning schemes often hash. SPKI is not just the raw modulus or the raw elliptic curve point. It includes the algorithm identifier plus the public key bit string. That distinction is why SPKI pinning survives certificate renewal better than pinning the whole certificate document.

The signature on a certificate is also often misunderstood. The issuer does not sign the PEM text. It signs the DER encoding of the tbsCertificate field. If any byte in that structure changes, including a validity date or an extension flag, the signature no longer verifies.

Conceptually the layout is:

Certificate ::= SEQUENCE {
  tbsCertificate       TBSCertificate,
  signatureAlgorithm   AlgorithmIdentifier,
  signatureValue       BIT STRING
}

and the signature is computed over the DER bytes of tbsCertificate.

That is why naive re-serialization bugs can matter in certificate tooling. DER is a canonical encoding. The exact bytes matter. Validators are not reading a semantic object model detached from encoding. They are checking a canonical signed byte sequence.

For most operators, the practical lesson is simple: certificates are strict binary identity statements with a lot of policy packed into them. If you ever have to debug a parser mismatch, a broken extension, or a pinning hash, it helps to remember that the underlying object is not "some text with a signature". It is a nested binary structure whose fields are interpreted according to X.509 and algorithm-specific rules.

The CA Hierarchy Exists to Keep Root Keys Offline and Rarely Used

The trust model of the public web is hierarchical.

At the top are root certificate authorities. A root CA certificate is self-signed and stored in operating system or browser trust stores. It is the trust anchor. Your machine does not discover a root on the fly. It already has a curated list of roots that platform vendors decided to trust.

Below the root sit one or more intermediate CAs. Intermediates are signed by a root or another intermediate. They are the workhorses of issuance. In a healthy PKI, the root key is kept offline in an HSM-backed ceremony-heavy environment and used sparingly. Operational issuance happens through intermediates.

Below the intermediates sit leaf certificates. These are the certificates servers, APIs, mail gateways, or client-auth systems actually present during TLS.

The hierarchy exists for risk reduction:

  • if a leaf key is compromised, only that leaf is affected
  • if an intermediate is compromised, the root can revoke or stop trusting that intermediate without rotating the root itself
  • if the root key were used for everyday issuance, compromise would be catastrophic and global

This is the same principle as not using your ultimate signing key for every routine operational action.

A typical web chain looks like this:

Leaf certificate:      CN/SAN = api.example.com
    signed by
Intermediate CA:       Let's Encrypt R11
    signed by
Root CA:               ISRG Root X1

The server usually sends the leaf and the intermediate. It usually does not send the root. The client is expected to already have the root in its trust store.

Trust Stores Are the Real Root of Authority

Engineers often say "the certificate chains to a root," but the important detail is which root store the client actually uses.

On the public web, trust anchors come from platform stores such as:

  • the Mozilla root program
  • Apple's trust store
  • Microsoft's trusted root store
  • Android's system CA store

Those programs impose policy and auditing requirements on CAs. A root is not trusted because it is mathematically special. It is trusted because the client software ships with it as a policy choice.

This leads to several operational realities:

  • a certificate can validate on one platform and fail on another if their trust stores differ
  • enterprise environments can add private roots, which changes the effective trust model
  • browsers that use their own store may behave differently from OS-native clients

When an enterprise installs its own root CA on managed laptops, it effectively tells those clients: "you should trust certificates issued by our internal PKI as much as you trust public web PKI for the scopes we permit." That is how corporate TLS inspection boxes work. They mint substitute leaf certificates on the fly, and the managed endpoint accepts them because the enterprise root is trusted locally.

So when you ask whether a certificate is valid, the first hidden question is: valid according to whose trust anchors?

Chain Building Is More Than Just Checking One Signature After Another

People often picture chain validation as a simple linked list:

  1. verify leaf with intermediate public key
  2. verify intermediate with root public key
  3. trust root

That is the basic shape, but real path building is messier.

The client has to construct a candidate path from the presented leaf certificate back to a trust anchor. To do that, it uses fields such as:

  • Issuer: whose name supposedly signed this certificate
  • Authority Key Identifier (AKI): which issuer key is expected
  • Subject Key Identifier (SKI) on candidate issuer certs
  • locally cached intermediates
  • certificates delivered by the server
  • sometimes AIA URLs that point to issuer certificates

Several intermediates may have similar names. A certificate may have multiple possible issuer paths because of cross-signing. The client may prefer one chain over another based on trust store contents, expiration, or policy.

This is why "but I sent the full chain" is not always enough. The client does not blindly accept the chain in the order the server transmitted it. It performs path building and validation according to its own rules.

At a high level, validation looks like:

for each candidate chain:
  verify each signature
  verify issuer/subject linkage
  verify BasicConstraints and path length
  verify KeyUsage permits certificate signing where needed
  verify validity time windows
  verify hostname and EKU on the leaf
  verify revocation / status policy
  verify chain ends in trusted root
accept first chain that satisfies policy

The important phrase is satisfies policy. PKI is not just cryptography. It is cryptography plus policy encoded in certificate metadata and local trust rules.

Real Path Building Fails in Specific and Surprisingly Repeatable Ways

The abstract chain-building algorithm sounds general, but the practical failure modes are repetitive. If you have debugged enough TLS incidents, you start seeing the same classes of breakage.

Missing intermediates

This is the most common operational mistake. The server is configured with the leaf certificate only, or with the wrong bundle order, and assumes clients will fetch the intermediate somehow.

Some clients do. Some do not. Some succeed only because they cached the intermediate from a previous connection. This produces the classic bug report pattern:

  • Chrome works on one engineer's laptop
  • curl in a container fails
  • some mobile clients fail only on fresh install

That inconsistency is usually not magic. It is path building with different local caches and issuer-fetch behaviour.

Multiple possible issuers

If two intermediates share similar subject names or if cross-signing exists, a client may have more than one possible chain. One path may end at a trusted root, another at an expired or absent one. Different validation stacks may choose differently.

This is why a server operator cannot assume that "the chain file I installed" is the only path a client will consider. Clients do not merely replay the server's preferred story. They build their own.

Name constraints and policy constraints

These are rarer on the public web, but when they appear they can invalidate otherwise plausible chains. An intermediate may be technically capable of signing a leaf, yet not authorised by policy extensions to issue for that namespace or purpose.

Old roots, new roots, and weird compatibility boundaries

A root rollover or cross-signing transition can leave parts of the ecosystem in an awkward middle state:

  • older clients trust one root but not another
  • newer clients prefer a cleaner path
  • a stale intermediate bundle nudges some clients toward the wrong chain

This is why CA transitions often involve careful compatibility matrices. Chain building is deterministic inside one validation engine with one trust store, but the public internet is not one validation engine with one trust store.

Time skew

Certificate validity depends on local time. If a device clock is badly wrong, a perfectly good chain may fail with "not yet valid" or "expired" errors. This is pedestrian, but it matters, especially on embedded systems and lab environments without reliable time sync.

AIA fetching as an accidental dependency

Authority Information Access can save you when the server omitted an intermediate, but it can also create fragile hidden dependencies. If your chain validates only because a client fetched an issuer certificate over HTTP from an AIA URL, you do not really have a robustly configured server. You have a lucky client.

The practical rule for operators is therefore stricter than "my browser reaches the site". The real rule is:

the server should present a complete and sensible chain such that clients do not need to guess, fetch, or recover

That is what keeps validation deterministic across browsers, mobile apps, API clients, containers, and internal service meshes.

What the Server Actually Sends During TLS

During a TLS handshake, the server sends a certificate list in its Certificate message. In most web deployments that list contains:

  • the leaf certificate for the hostname
  • one or more intermediates needed to validate it

The server typically omits the root certificate because:

  • it wastes bytes
  • the client should already know the trust anchor
  • sending the root does not make an untrusted root trusted

If the server omits an intermediate, some clients may still succeed because they cached the intermediate from an earlier connection or fetched it via AIA. Others will fail immediately with an issuer error.

This is why browsers can sometimes appear "more forgiving" than minimal TLS clients. Browsers have richer caches and more path-building logic. A stripped-down container image running curl against OpenSSL with an incomplete CA bundle may fail where Chrome succeeds.

If you want to inspect the chain a server presents:

openssl s_client -connect debtman.dev:443 -servername debtman.dev -showcerts

That command shows the certificates the server sent, not necessarily the exact path your browser ultimately built.

The Leaf Certificate Is Where Hostname Validation Happens

For a web server certificate, the most important identity check is the hostname match. Modern clients use the Subject Alternative Name extension, not the legacy Common Name, for DNS identity.

If the user connects to api.example.com, the leaf certificate must contain a SAN entry that covers it, such as:

  • DNS:api.example.com
  • or a wildcard like DNS:*.example.com if policy allows and the hostname matches wildcard rules

The Common Name used to be overloaded for this purpose, but modern validation rules require SAN. A certificate with a beautiful CN and no SAN may still fail validation.

This is a critical design point: the leaf certificate proves not just "somebody owns this key." It proves "this key is authorized for this exact name or set of names."

That is the bridge between PKI and HTTPS. TLS uses the certificate chain for authentication, but HTTPS uses hostname validation to decide whether the authenticated identity matches the URL the user intended.

X.509 Extensions Are Where the Real Rules Live

The signature tells you the issuer signed the certificate. The extensions tell you what the certificate is allowed to mean.

In practice, several X.509 v3 extensions matter constantly.

Basic Constraints

This extension says whether a certificate is allowed to act as a CA.

Typical forms:

  • leaf certificate: CA:FALSE
  • intermediate CA: CA:TRUE

If a leaf certificate says CA:TRUE, or if a client accepts a chain where a non-CA certificate issued another certificate, something is badly wrong. This extension is one of the core protections that stops arbitrary certificates from issuing more certificates.

It can also carry a path length constraint limiting how many subordinate CA levels may appear below it.

Example:

BasicConstraints: critical, CA:TRUE, pathlen:0

That means the certificate may issue leaf certificates but not additional subordinate CAs.

Key Usage

This extension controls which cryptographic operations the key may perform.

Common flags include:

  • digitalSignature
  • keyEncipherment
  • keyCertSign
  • cRLSign

For a CA certificate, keyCertSign is crucial. For a TLS leaf, digitalSignature is often required because modern TLS certificate authentication is signature-based.

If a certificate's key usage does not permit the operation being attempted, validation should fail even if the signature chain is otherwise intact.

Extended Key Usage

EKU narrows the intended application context.

Common values:

  • serverAuth
  • clientAuth
  • codeSigning
  • emailProtection
  • OCSPSigning

A certificate valid for code signing is not automatically valid for a TLS server. A client certificate for mTLS is not automatically a web server certificate. EKU prevents one certificate from being promiscuously reused across unrelated trust contexts.

For HTTPS, the leaf should usually contain serverAuth.

Subject Alternative Name

As discussed above, SAN carries DNS names, IP addresses, email addresses, or URIs depending on the application. For web PKI, this is where hostname identity lives.

Authority Key Identifier and Subject Key Identifier

These are helpers for path building.

  • SKI identifies the certificate's own public key
  • AKI points toward the key that signed this certificate

They help clients match a child certificate with the right issuer, especially when names alone are ambiguous.

Authority Information Access

AIA often contains:

  • an OCSP responder URL
  • a URL from which the issuing CA certificate can be downloaded

This is part of why clients can sometimes recover from missing intermediates or perform OCSP status checks without local configuration.

CRL Distribution Points

This extension tells clients where to fetch certificate revocation lists, though many web clients avoid heavy CRL usage for latency and reliability reasons.

Why Some Extensions Are Marked Critical

An X.509 extension can be marked critical, which means a client that does not understand it must reject the certificate.

This prevents an older or incomplete validator from silently ignoring a rule that the issuer considered mandatory.

For example, BasicConstraints is commonly critical on CA certificates because ignoring it would be dangerous. If a validator did not understand whether a certificate was permitted to issue sub-certificates, it should fail closed rather than guess.

Criticality is one of the places where PKI reveals its policy-driven nature. The issuer is saying: "this constraint is not informational; it is part of the certificate's meaning."

Intermediate CAs Are Operational Blast Shields

Why not just have the root issue the website certificate directly?

Because intermediates create containment and operational flexibility.

A public CA may operate several intermediates for different purposes:

  • web server issuance
  • client authentication
  • code signing
  • region-specific or policy-specific issuance

If one intermediate has to be retired, audited, or distrusted, that is far less painful than rotating the root that millions of clients trust.

Intermediates also let roots stay offline. That matters because the root private key is existential. If it is compromised, every certificate beneath it becomes suspect.

The web PKI therefore treats roots as strategic assets and intermediates as tactical issuers.

Cross-Signing Complicates Chains but Helps Compatibility

Sometimes the same intermediate or root is made reachable through multiple trust paths. The most common reason is compatibility with older clients.

In a cross-signing setup, one CA signs another CA's certificate so that clients trusting the signer can also reach the new key material.

This means a single leaf might validate through more than one chain depending on the client trust store. That is why the chain a browser uses is not always identical to the chain you expected from the server configuration.

Cross-signing is useful during transitions, but it also makes path building more complex:

  • multiple candidate paths may exist
  • some paths may end in trust anchors a given client does not have
  • one path may be expired while another remains valid

This is another reason engineers should think of PKI validation as path construction plus policy evaluation, not just "check the next signature."

Revocation Is the Most Awkward Part of the Whole System

Certificates expire naturally, but sometimes you need to invalidate one before it expires:

  • the private key was compromised
  • the certificate was issued by mistake
  • the subject no longer controls the hostname
  • the CA or intermediate itself is being distrusted

That is revocation.

In theory revocation sounds simple. The issuer publishes a list or an online responder says "good", "revoked", or "unknown". In practice it is hard because the check happens exactly when the client is trying to establish a secure connection, often on unreliable networks, under latency pressure.

If revocation checking is mandatory and blocking, what happens when the OCSP responder is down? Or the captive portal blocks it? Or a hotel WiFi intercepts it? If the client fails closed, users lose connectivity. If it fails open, attackers can block the status check and keep using a revoked certificate.

This is the classic revocation dilemma.

CRLs

A certificate revocation list is a periodically published signed list of revoked serial numbers.

Advantages:

  • simple conceptually
  • signed and cacheable
  • does not require a live query for every connection

Disadvantages:

  • can become large
  • updates are coarse rather than per-connection fresh
  • clients may not fetch them reliably

OCSP

The Online Certificate Status Protocol lets a client ask about one specific certificate status.

Advantages:

  • lighter than downloading whole CRLs
  • status can be fresher

Disadvantages:

  • introduces latency and privacy leakage if the client asks directly
  • responder availability becomes part of connection setup
  • clients often softened failure semantics in practice

The web eventually converged on a partial improvement: have the server staple an OCSP response into the TLS handshake.

Must-Staple Tried to Tighten the Rule

One of the obvious reactions to soft-fail revocation is: why not require stapling and reject the connection if the staple is missing?

That idea exists in the form of the TLS Feature extension, often called Must-Staple. A certificate can include a signal saying, in effect, that the relying client should expect stapled OCSP status during TLS.

In theory this is attractive:

  • the client does not need a separate live OCSP lookup
  • the server is forced to keep status fresh
  • the fail-open behaviour gets tighter

In practice, Must-Staple never became a universal fix because it raises operational stakes sharply. If the server fails to refresh its OCSP response or some edge path drops the staple, clients that honour Must-Staple may hard-fail the connection. That is secure, but it also turns status freshness into a hard availability dependency.

This is the recurring pattern in PKI: every time you try to make policy stricter, you discover a new way to turn internet unreliability into user-visible outages.

So Must-Staple is useful in some controlled environments, but it did not simplify revocation enough to become the final answer for the whole web.

OCSP Stapling Moves the Status Fetch from Client to Server

With OCSP stapling, the server periodically fetches a signed OCSP response from the CA and includes it during TLS. The client can validate that response without making its own direct request to the CA's OCSP responder.

This helps in three ways:

  1. Latency: the client avoids an extra round trip to the CA
  2. Privacy: the CA does not learn which sites the client is visiting in real time
  3. Reliability: the server can refresh status ahead of time instead of every client depending on responder reachability

The stapled response is signed by the issuer or an authorized OCSP responder key and includes:

  • certificate status
  • thisUpdate / nextUpdate timing
  • the certificate serial being described

The server "staples" it to the handshake in the CertificateStatus message path. The client checks the signature, freshness, and identity of the stapled response.

But stapling is still not magic:

  • the server has to keep the response fresh
  • not every deployment enables it correctly
  • stapling applies mainly to leaf status, not the entire universe of PKI failure modes

So stapling improves revocation checking. It does not make revocation elegant.

Browsers Also Shifted Toward CRLite, Blocklists, and Soft-Fail Pragmatism

At internet scale, browsers learned that pure online revocation checks were not enough. They layered additional mechanisms:

  • hardcoded or remotely updated revocation blocklists for major incidents
  • CRLite-style compressed revocation data in some ecosystems
  • soft-fail behavior in many ordinary OCSP failure scenarios

This is worth emphasizing because many engineers assume revocation behaves like a clean always-online oracle. It does not. Browsers mix several strategies because the real internet is too unreliable and adversarial to depend on one blocking status query model.

That is also why revocation incidents often become browser-vendor policy events rather than purely CA-protocol events.

Certificate Transparency Exists Because CAs Sometimes Mis-Issue

Public web PKI has a structural weakness: any publicly trusted CA in the relevant trust store can issue a certificate for your domain. Most of the time they do not, because policy, audits, domain validation controls, and CA governance exist. But if a CA is compromised, coerced, or simply makes a mistake, a rogue certificate can appear.

Certificate Transparency addresses this by making public certificate issuance visible.

A CT log is an append-only, publicly auditable log of issued certificates or pre-certificates. When a CA issues a public TLS certificate, it submits it to one or more CT logs and receives Signed Certificate Timestamps (SCTs). Those SCTs prove the certificate was promised inclusion in public logs.

Modern browsers require SCT evidence for publicly trusted web certificates. In effect the browser says:

"It is not enough that a trusted CA signed this certificate. The issuance also has to be visible to the ecosystem."

This changes the security model in an important way.

CT does not prevent bad issuance in real time.

What it does is make bad issuance detectable:

  • domain owners can monitor logs for unexpected certificates
  • browser vendors and researchers can audit CA behavior
  • append-only Merkle-tree properties make hidden split views harder

That is a huge improvement over the old world where a mis-issued certificate might exist without the domain owner learning about it.

How CT Logs Work at a High Level

A CT log is a Merkle-tree-backed append-only structure.

When a CA submits a certificate to the log, the log returns an SCT saying, roughly:

  • I received this certificate
  • I commit to include it in my append-only log within the maximum merge delay
  • here is my signed promise

Browsers later see the SCT embedded in the certificate, delivered via TLS extension, or attached by stapling-like mechanisms. Auditors and monitors check that the certificate actually appears in the log. Merkle inclusion proofs and consistency proofs help verify that the log has not presented incompatible histories.

The point is not that every browser verifies every detail on every page load. The point is that the ecosystem can audit the issuance universe after the fact in a cryptographically accountable way.

For operators, CT matters because it makes certificate issuance observable. If someone manages to obtain a fraudulent certificate for your domain, log monitoring is one of the fastest ways to discover it.

Merkle Proofs Are What Make CT Auditable Instead of Merely Public

If CT logs were just giant published lists, they would still be useful, but they would not be robust against subtle equivocation. The important improvement is the Merkle-tree structure.

In a Merkle tree:

  • each leaf is hashed
  • internal nodes are hashes of child hashes
  • the root hash commits to the entire tree contents

That gives you two valuable proof types.

Inclusion proofs

An inclusion proof shows that a specific certificate entry really is in a particular tree whose root hash is known. The verifier does not need the whole log. It needs the entry, the sibling hashes along the path, and the signed tree head.

Consistency proofs

A consistency proof shows that a newer tree is an append-only extension of an older tree rather than a rewritten alternative history. This matters because a dishonest log could otherwise try to present different views to different observers.

Those proof types are why CT is more than "publish certificates somewhere". It is a cryptographically checkable accountability system. Logs commit to a history, and auditors can test whether later states are consistent append-only growth from earlier ones.

For the ordinary site operator, the Merkle details are mostly hidden. But they explain why CT can support ecosystem-wide monitoring without every participant downloading the full issuance universe all the time.

They also explain why SCTs are only promises of inclusion, not inclusion itself. The SCT says the log committed to merge the certificate within a maximum time. Auditors and monitors then verify that this promise was honoured and that the log's history remains append-only.

That structure is what gives CT real security value. Without the proof machinery, "public log" would be a transparency slogan. With Merkle proofs, it becomes an auditable distributed accountability mechanism.

Monitoring CT Logs Is a Real Operational Discipline

CT gives visibility only if somebody is actually watching.

Large organisations therefore run or buy monitoring that alerts on:

  • unexpected certificates for production domains
  • certificates for lookalike subdomains
  • new intermediates or unusual issuers
  • wildcard issuance where none was expected
  • certificates for forgotten legacy domains that still matter

This is important because rogue issuance is often less dramatic than people imagine. The attack path may be:

  1. compromise or social-engineer a domain validation channel
  2. obtain a seemingly valid certificate from a trusted CA
  3. use that certificate for a narrow interception window

If nobody monitors CT logs, discovery may be delayed. If monitoring exists, the certificate can become suspicious within minutes.

So CT should be thought of as a detection surface, not a magic prevention shield. Its power comes from shortening the time between issuance and detection.

CT Logs, OCSP, and Revocation Solve Different Problems

These mechanisms often get lumped together, but they are not interchangeable.

  • Chain validation answers: was this certificate issued by a trusted hierarchy and is it structurally valid?
  • Revocation / OCSP / CRLs answer: was this once-valid certificate later invalidated?
  • Certificate Transparency answers: did this issuance become publicly visible so others can detect suspicious or unauthorized certificates?

CT is not revocation. A certificate can be in CT and still be maliciously issued. OCSP is not CT. A certificate can have an OCSP response and still have been mis-issued. You need the combination because PKI has multiple failure modes.

What Certificate Pinning Actually Pins

Certificate pinning is one of the most misunderstood parts of this space.

Many engineers say "we pin the certificate" when what they should mean is "we pin the public key material we expect to see somewhere in the validated chain."

There are several possible things you could pin:

  • the entire leaf certificate
  • the entire intermediate certificate
  • the raw public key
  • the SubjectPublicKeyInfo (SPKI) hash

The operationally sensible answer is usually SPKI pinning.

Why? Because certificates are short-lived operational wrappers around key material. If you pin the whole leaf certificate, you will break clients every time you renew that certificate even if the same key is reused. If you pin the SPKI, a renewed certificate carrying the same public key still matches.

That is why the right mental model is:

pinning usually pins a key, not a certificate document

and more specifically:

well-designed pinning usually pins the SPKI hash of one or more acceptable keys

This can apply to:

  • the leaf key
  • a backup leaf key prepared for rotation
  • sometimes an intermediate key, depending on the trust design

Why HPKP Died

HTTP Public Key Pinning, or HPKP, tried to bring pinning to browsers through an HTTP response header. In theory this let sites tell browsers which keys would be acceptable on future visits.

In practice it was dangerous.

If you misconfigured HPKP, you could lock users out of your own site for the duration of the pin. If an attacker briefly controlled your headers, they could pin their keys and effectively brick access for users even after you recovered the server. This was a hostage-taking primitive disguised as a security feature.

Browsers eventually removed HPKP because:

  • the operational failure mode was severe
  • backup-pin handling was easy to get wrong
  • the feature was useful mainly to sophisticated operators
  • CT provided a safer ecosystem-level answer to rogue issuance detection

Pinning did not disappear entirely. It mostly moved to places with tighter control, such as mobile apps or dedicated clients, where the operator controls both ends more directly.

The Correct Place of Pinning in the Validation Stack

Pinning is not a replacement for PKI validation. It is an extra restriction layered on top.

A robust client usually does:

  1. complete ordinary path validation
  2. verify hostname and EKU
  3. check revocation / status policy
  4. then enforce pinning constraints

That matters because pinning is not supposed to say "ignore the web PKI, only compare one hash." It is supposed to say "after normal PKI says this connection is acceptable, additionally require that the observed key material matches one of my pre-authorized pins."

So the thing pinning protects against is narrower:

  • a rogue or compromised publicly trusted CA issuing an otherwise valid-looking certificate
  • some enterprise interception scenarios you intentionally want to reject

Pinning is not a substitute for a CA hierarchy. It is a way to narrow trust further for clients that can tolerate the operational risk.

A Concrete Let's Encrypt Chain

Consider a representative Let's Encrypt-backed site. The server might present:

Leaf:         CN/SAN = example.com, *.example.com
Issuer:       Let's Encrypt R11
Signature:    made by R11 private key
 
Intermediate: Let's Encrypt R11
Issuer:       ISRG Root X1
Signature:    made by ISRG Root X1 private key
 
Root:         ISRG Root X1
Stored locally in trust store

The client validates as follows:

  1. parse the leaf and intermediate
  2. verify the leaf was signed by R11
  3. verify R11 is a CA and is allowed to sign certificates
  4. verify R11 was signed by ISRG Root X1
  5. verify ISRG Root X1 is in the trust store
  6. verify the leaf SAN matches the requested hostname
  7. verify time validity, EKU, and policy constraints
  8. verify status signals such as stapled OCSP if required
  9. verify SCT / CT expectations for public web PKI

Only then is the leaf public key accepted as the authenticated TLS identity.

The lab that accompanies this post visualizes exactly this sort of chain: a Let's Encrypt-style leaf anchored at an ISRG root through an intermediate, with the important X.509 constraints shown on each step.

Why This Matters During the TLS Handshake

In the HTTPS deep dive and the TLS 1.3 post, the certificate chain appears in the middle of the handshake. That can make it look like the chain is just one more handshake blob.

It is not.

TLS does two distinct things:

  • key agreement through ephemeral Diffie-Hellman, which gives confidentiality and forward secrecy
  • authentication through certificates and signature verification, which binds the connection to an identity

PKI is the authentication half.

The server's leaf certificate does not usually encrypt the session. In TLS 1.3, the certificate mainly proves identity, and CertificateVerify proves possession of the corresponding private key for the handshake transcript. The actual session confidentiality comes from ephemeral key exchange.

This distinction is crucial because it explains several common misconceptions:

  • stealing a certificate file without the private key is not enough to impersonate the server
  • stealing the long-term private key is bad for authentication but, in TLS 1.3, does not retroactively decrypt old sessions if ephemeral ECDHE was used
  • renewing a certificate does not necessarily change the pinned key if the key pair is reused

PKI authenticates the server inside TLS. It is not the same thing as the TLS key schedule.

How to Debug PKI Problems Without Guessing

PKI failures often feel mysterious because several layers are involved at once: certificate contents, chain delivery, trust store contents, hostname matching, status checks, and sometimes application-level pinning. The fastest way to get unstuck is to debug those layers separately.

Step 1: Inspect what the server actually presents

Start with:

openssl s_client -connect example.com:443 -servername example.com -showcerts

This tells you:

  • which certificates the server sent
  • whether stapled OCSP was present
  • which TLS parameters were negotiated

It does not tell you automatically that the chain is good according to your target client. It tells you the raw input the client received.

Step 2: Inspect the leaf and intermediate contents

Take each PEM block and decode it:

openssl x509 -in leaf.pem -text -noout
openssl x509 -in intermediate.pem -text -noout

Check specifically:

  • SAN entries
  • issuer and subject
  • validity dates
  • Basic Constraints
  • Key Usage
  • Extended Key Usage
  • AIA and CRL distribution points

Do not skim. Most real failures are sitting plainly in these fields.

Step 3: Verify the chain explicitly

Use openssl verify with the right trust anchor and any needed intermediates:

openssl verify -CAfile root.pem -untrusted intermediate.pem leaf.pem

This is one of the cleanest ways to separate "the certificate content is broken" from "the server failed to deliver a usable chain" from "the local trust store lacks the required root."

Step 4: Test hostname matching, not just signature validity

A certificate can verify cryptographically and still fail for the actual destination name. The SAN list must cover the real hostname the client requested.

Classic mistakes include:

  • certificate valid for www.example.com but user connects to api.example.com
  • wildcard expected to cover too many label levels
  • IP connection attempted against a DNS-only certificate

Step 5: Distinguish PKI failure from pinning failure

If a mobile app or private client rejects a site while browsers succeed, pinning may be the extra rule that failed. Ask:

  • did the key rotate?
  • was the pin over the whole certificate instead of SPKI?
  • was the backup pin absent?
  • does the app pin an intermediate rather than the leaf?

This is a common source of confusion because the visible symptom is just "TLS failed", but the public PKI path may actually be fine.

Step 6: Check client trust context

Validation is always relative to a trust store. So ask:

  • which root bundle does the client actually use?
  • is this a browser with its own store or an OS-native client?
  • is an enterprise root installed?
  • is the client in a stripped-down container with an incomplete CA bundle?

A certificate that works on a fully provisioned laptop may fail in a minimal Docker image simply because the image lacks updated CA certificates.

Step 7: Check time and revocation signals

If the clock is wrong or the stapled OCSP response is stale, you may get misleading validation errors. This is especially common on internal appliances, embedded systems, and freshly booted lab VMs.

The debugging discipline here is simple:

  • inspect the bytes presented
  • inspect the X.509 fields
  • verify the chain explicitly
  • verify hostname and trust store
  • only then look for exotic explanations

PKI is complicated, but its failures are rarely random. They usually become legible as soon as you separate certificate structure, path building, trust anchors, and local extra policy such as pinning.

Why Private PKI Feels Simpler and Harder at the Same Time

Inside organizations, teams often run an internal PKI for:

  • mTLS between services
  • device identity
  • VPN client authentication
  • internal dashboards and APIs

Private PKI is simpler because you control the root store and policy. You do not need the public web's universal trust model. But it is also harder operationally because you become the CA operator, trust-store distributor, revocation authority, and incident responder.

Public PKI outsources trust establishment to browser and OS ecosystems. Private PKI gives you more control at the price of more responsibility.

That is why tools like SPIFFE, SPIRE, Vault PKI, cert-manager, and cloud-managed private CA products exist. They are all attempts to automate the painful parts:

  • issuance
  • short-lived certificates
  • rotation
  • trust distribution
  • revocation / replacement workflows

The Most Common PKI Failure Modes Are Mundane

Most production PKI incidents are not exotic breaks of number theory. They are ordinary operational errors:

  • expired certificate
  • missing intermediate
  • wrong hostname in SAN
  • wrong EKU for the certificate's intended use
  • stale stapled OCSP response
  • clients missing the required root
  • accidentally pinning one key with no backup
  • forgetting that different client platforms build chains differently

This is important because engineers sometimes over-focus on the abstract cryptography and under-focus on the concrete validation rules. In real systems, PKI usually fails at the seams between policy, metadata, and deployment automation.

The Useful Mental Model

The right way to think about PKI is:

  1. a certificate is a signed statement binding identity to a public key
  2. a chain is a path from that statement back to a trust anchor the client already accepts
  3. extensions define what each certificate is allowed to do
  4. revocation tries to invalidate certificates before expiry, imperfectly
  5. OCSP stapling reduces the cost and privacy leakage of status checks
  6. Certificate Transparency makes issuance visible so mis-issuance can be detected
  7. pinning adds a stricter local trust rule, usually over SPKI hashes, on top of normal PKI validation

Once you see those layers separately, the system stops looking like random TLS bureaucracy.

PKI is the mechanism that lets a browser in Athens connect to a server in Paris, see a public key it has never seen before, and decide with reasonable confidence whether that key belongs to the real site. TLS then uses that authenticated key to establish an encrypted channel. CT makes the issuance auditable. Revocation tries, imperfectly, to unwind bad certificates. Pinning narrows trust for clients willing to manage the operational sharp edges.

That is how PKI actually works.