Docker Hardened Images Are Free Now — Here's What You Still Need to Build

Docker Hardened Images solve the CVE problem. But CVEs aren't why containers fail in production — governance gaps are. Here's the trust architecture that closes them.

Shamsher Khan

CORE ·

May. 27, 26 · Analysis

Likes (0)

Comment

Save

4.5K Views

The Problem Isn't the Image

Hardened container images are no longer niche. Docker open-sourced major portions of the tooling behind Docker Hardened Images under Apache 2.0 in late 2025. Chainguard and Google's distroless variants sit in the same space. The pitch across all three: fewer packages, smaller attack surface, dramatically lower CVE counts. The pitch is accurate. It is also incomplete.

Most container security failures are not image failures. They are governance failures:

A team pushes a debug build to production. Admission control doesn't block it because the policy is in Audit mode, not Enforce.
A six-month-old deployment keeps running an ancient image digest while the team patches newer builds. Nobody detects the drift.
The platform team rotates signing keys. Old pipelines keep producing images signed with the revoked key. Admission still accepts them. Nobody notices for ninety days.
A vendor pushes an updated base image under the same tag. CI rebuilds against the new digest. The new digest is unsigned. Production takes it. No alert fires.

None of these are CVE failures. They are governance failures — gaps in how images are produced, attested, verified, and monitored. Swapping the base image to a hardened variant changes none of them. A signed-and-attested hardened image in a cluster that doesn't verify signatures is operationally equivalent to a signed Ubuntu image in that cluster: the signature is decorative.

I recently worked on migrating a regulated production workload onto a hardened-image baseline. Lab 12 of my docker-security-practical-guide repository is a sanitized, reproducible distillation of what that work taught me. The short version: the value is in the control plane around the image, not the image itself.

The Trust Control Plane in 60 Seconds

In practice, the hardest part is not enabling hardened images. It is operating trustworthy deployments at scale without slowing engineers down.

The operating model has three layers, joined by a feedback loop:

Supply Chain layer – images are signed (cosign keyless against Fulcio), attested with an SBOM (syft + CycloneDX), and scanned for vulnerabilities (grype). The output: an image whose origin and contents are independently verifiable by anyone.
Trust layer – an admission controller (Kyverno) verifies signatures and attestations before any pod is scheduled. The admission policy is the unit of governance: it encodes which signers, which attestations, and which constraints are required for a workload to start.
Enforcement layer – continuous drift detection answers the question: admission can't: has the digest drifted since we admitted it? Has the signing key been revoked? Has a new unsigned workload landed via a controller that bypasses admission?
Feedback loop – drift findings feed back into the supply chain: a drift event produces a rebuild; an admission rejection produces a ticket. Without the loop, the enforcement layer becomes an alerting backwater that engineers mute.

FIGURE 1 — Trust control plane for cloud-native software supply chain security.
The architecture separates supply chain generation, admission-time trust verification, and continuous runtime enforcement into independent layers connected through a feedback loop. The pattern is vendor-agnostic: any compatible signing, admission, and drift-detection components can fulfill these roles.

The bottom line: a hardened image is one input to the supply chain layer. Without trust verification, it's indistinguishable from a regular image at deploy time. Without enforcement, untrusted images coexist with hardened images in the same cluster. Without the feedback loop, trust state drifts silently.

Admission Control: Where Governance Gets Teeth

The trust layer is where the control plane becomes operationally real. In the lab, Kyverno's verifyImages rule asserts that every image carries a cosign signature from an approved identity. Here's the core of the policy:

    YAML
   
 

   apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-signed-images
spec:
  validationFailureAction: Enforce
  rules:
    - name: verify-cosign-keyless
      match:
        any:
          - resources:
              kinds: [Pod]
      verifyImages:
        - imageReferences: ["ghcr.io/opscart/*"]
          attestors:
            - entries:
                - keyless:
                    subject: "https://github.com/opscart/*"
                    issuer: "https://token.actions.githubusercontent.com"
          required: true
  

The subject and issuer together define who is trusted. For DHI images, these values point to Docker's signing identity. For Chainguard, Chainguard's. The shape of the policy is identical in all cases — only the identity matcher changes.

When someone deploys an unsigned image, the rejection is immediate and actionable:

    Shell
   
   $ kubectl run test --image=nginx:latest --restart=Never
Error from server: admission webhook "validate.kyverno.svc-fail"
denied the request:

resource Pod/default/test was blocked due to the following policies

require-trusted-registry:
  trusted-registries-only: 'validation error: Image must come from
    a trusted registry. Allowed: dhi.io/*.'

FIGURE 2 — Kyverno admission webhook rejecting an nginx pod from an untrusted registry. Capture from terminal: kubectl run rejected-test --image=nginx:latest --restart=Never (with cluster up and policies applied).

Catching an unsigned image at admission costs one re-run of kubectl apply. Catching the same workload running in production a week later costs a security ticket, an incident response, and possibly a regulatory disclosure conversation. Moving rejection earlier is the highest-leverage decision in the entire model.

Phased Rollout: Audit Before Enforce

In production, you don't flip everything to Enforce on day one. The lab uses a phased approach: the trusted-registry policy runs in Enforce mode (hard gate on image origin), while signature and SBOM verification policies run in Audit mode (log violations, don't block). This gives teams a migration runway: they can see which workloads would fail and fix them before the policies graduate to Enforce. The shift from Audit to Enforce is a single-field YAML change.

Signing Your Supply Chain: Keyless Cosign

The supply chain layer produces the artifacts that admission verifies. A common modern approach uses cosign with GitHub Actions OIDC for keyless signing — no private keys to manage, rotate, or leak.

The mechanism: GitHub Actions mints a short-lived OIDC token at workflow time. Cosign exchanges it for an ephemeral certificate from Sigstore Fulcio, signs the image, and destroys the key immediately. The certificate records which workflow, on which repository, at which commit, produced the signature. The signature is logged in Sigstore Rekor's public transparency log.

The lab's pipeline implements a full build → push → sign → attest → verify flow that fails closed if verification breaks.

The lab's pipeline implements a full build → push → sign → attest → verify flow that fails closed if verification breaks. The complete workflow and run history is public.

The important property is that anyone can independently verify the signed artifact.

    Shell
   
 

   cosign verify \
    --certificate-identity-regexp \
      "^https://github\.com/opscart/docker-security-practical-guide/
       \.github/workflows/supply-chain-gate\.yml@.+$" \
    --certificate-oidc-issuer \
      "https://token.actions.githubusercontent.com" \
    ghcr.io/opscart/docker-security-practical-guide/dhi-sample-app:latest

Verification for ghcr.io/opscart/.../dhi-sample-app:latest --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - Existence of the claims in the transparency log was verified offline
  

FIGURE 3 — cosign verify succeeds for any reader, without shared secrets. Capture from terminal: run the cosign verify command above against the published image at ghcr.io.

This is what "supply chain security" means in practice: not "we sign our images," but "our trust assertions are independently verifiable by anyone, against neutral infrastructure, without prior trust setup." The published image can be verified directly against the public artifact.

Fleet Drift: The Problem Nobody Watches

Admission is point-in-time. Production is continuous. The enforcement layer's job is to answer the questions that admission can't: has the digest drifted since we admitted it? Has a new unsigned workload landed via a controller that bypasses admission?

The lab's E1 experiment runs a drift audit against a synthetic 12-service fleet mixing DHI, Docker Hub, internally-built, and abandoned images. The fleet is intentionally constructed with an explicit variation matrix — the numbers below describe the synthetic fleet's structure, not measurements from a deployed environment.

In this synthetic fleet, unsigned services averaged 13.0 critical CVEs while signed-and-verified services averaged 0.0. The exact ratio will vary by environment, but the audit makes the trust gap continuously visible.

FIGURE 4a — Fleet drift audit: signing state vs CVE correlation across the synthetic fleet. Capture from terminal: run ./experiments/E1-drift-observation/analyze-drift.py. Screenshot Sections 1–3 (Fleet Summary + Origin×Signing Correlation + Signing State → CVE Accumulation)

FIGURE 4b — Remediation order: compliance-scope risk concentration and prioritized action queue. Same script output, Sections 4 + 7 (Compliance Scope Risk Concentration + Recommended Remediation Order)

The ratio isn't the point — your fleet will produce different numbers. What the control plane provides is the continuous, attributable surfacing of whatever the ratio actually is, including cases where the supposed benefit of hardening is harder to defend. That honest feedback loop is what turns the audit from a compliance checkbox into a supply chain prioritization tool.

The Substitution Test

A useful test for whether you've found an architectural pattern or a vendor recipe: can you swap a major component and have everything else continue to work?

For this architecture, the test is straightforward. The lab demonstrates three configurations: Docker Hardened Images (dhi.io), Chainguard Images (cgr.dev/chainguard), and a self-built Alpine base signed against a project-owned GitHub Actions OIDC identity. In all three, the Kyverno policy structure is identical. The drift audit runs unchanged. The SBOM verification runs unchanged. Edits are confined to the identity matcher and the image references.

The implication: "Should we standardize on DHI or Chainguard?" is a commercial decision (pricing, catalog coverage, support), not an architectural one. The architectural decision is whether to operate the trust control plane at all. A team that has invested in the control plane has built portable institutional capability. A team that has invested in "we use DHI" has bought a product, and a future migration off DHI is a structural rewrite rather than a configuration update.

Production Friction: What Actually Goes Wrong

The model works. It is also not free. Here are the operational costs my team hit, documented in detail in the companion repo's TROUBLESHOOTING.md:

No shell. Distroless hardened images don't include /bin/sh, curl, wget, cat, or ls. When an engineer pages at 2 AM and runs kubectl exec -it pod -- /bin/sh, the command fails. The remediation is kubectl debug with an ephemeral debug container attached to the pod's process namespace. Train your on-call rotation on kubectl debug before migration, not after. The lab's E5 experiment documents three debug patterns (ephemeral containers, dev-variant images in dev namespaces only, pre-built debug sidecars) with runbook scenarios for unreachable services, crashloops, and OOM kills.

Migration is not a FROM line change. The default user is nonroot (UID 65532), not root. Library paths differ. pip install --user installs to /home/nonroot/.local, not /root/.local. Required system packages (ca-certificates, timezone data) that come for free in stock bases must be explicitly carried over. The lab's Dockerfile required three iterations before the build succeeded locally: shell-form RUN failed (no /bin/sh), then pip --user installed to the wrong path, then requirements.txt pinned package versions that didn't exist on PyPI. Each of these is a 30-second local fix — and a 5-minute GitHub Actions round-trip if you don't test locally first.

Signature paths vary by vendor. DHI signatures resolve via registry.scout.docker.com, not at the image's own registry path. Kyverno handles this through the policy's repository field, but any custom verification tooling needs to know. Plan to audit verification code before migration.

Kyverno has schema gotchas. rekor and ctlog blocks must be inside keys, not siblings. webhookTimeoutSeconds is capped at 30. mutateDigest: true is incompatible with validationFailureAction: Audit. PolicyException requires an explicit feature flag. Each of these cost me 30–60 minutes of debugging — they're in TROUBLESHOOTING.md, so they don't cost you the same.

None of these are deal-breakers individually. All of them together are why migrations slip from "next quarter" to "abandoned after two months." Budget for friction.

When This Is Overkill

The investment's value scales with three factors: regulatory pressure (HIPAA, PCI-DSS, SOC 2 Type II, FDA 21 CFR Part 11), fleet size and heterogeneity (8+ clusters, dozens of teams pushing images), and blast radius (pharmaceutical patient data vs. internal dashboard).

Concretely: pre-production tools, side projects, prototypes, and developer sandboxes do not need this. They benefit from a hardened base image (free) and should not be put behind the full trust control plane. The overhead of policy maintenance, key rotation, and drift remediation outstrips the risk reduction. For most workloads outside regulated production, the supply chain layer alone — sign and SBOM your builds — captures most of the available value at a fraction of the cost.

Conclusion: Architecture Over Image Choice

Hardened images are useful. The point of this article is that they are one component of a broader architectural pattern, and the security outcomes regulated teams want are properties of the pattern, not the component.

A team that adopts hardened images without the surrounding pattern has made a real but limited improvement. A team that adopts the pattern with any reasonable image vendor — DHI, Chainguard, or a self-built base — has built portable institutional capability. The substitution test is the diagnostic: ask whether a future migration away from your current image vendor is a configuration edit or a structural rewrite. If it's the former, you have the pattern. If it's the latter, you have a product dependency.

The companion repository at github.com/opscart/docker-security-practical-guide (tag v1.12.0) contains everything in this article: working Kyverno policies, a keyless-signed sample image you can pull and verify right now, fleet drift audits, and five hypothesis-driven experiments. The cosign verify command above works against the published artifact today.

Spend the design effort on the pattern. The image will be replaceable. The governance is what survives vendor replacement.

This article is adapted from a longer write-up on OpsCart, which includes the complete threat model, substitution-test configurations, and an extended troubleshooting log.

Build (game engine) Docker (software)

Published at DZone with permission of Shamsher Khan. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending