Replaces the always-spawn-an-OCI-container model with a per-call
bubblewrap sandbox. Pandoc and chromium binaries are baked into the
zddc-server runtime image; each conversion runs them under bwrap's
Linux-namespace isolation. No daemon, no socket, no privileged outer
container, no OCI image pull at conversion time.
Why: the OCI engine paid ≈ 350 MB image pulls + 400 MB persistent
storage + ~300 ms per-conversion startup, plus required either an
on-host daemon socket (zddc-RCE → host-RCE in one hop) or nested
container privileges. bwrap gets the same sandbox properties
(--unshare-all, ro-bind /usr, tmpfs /tmp, clearenv, no-network) at
~5 ms per call and zero external dependencies. This is the same
primitive Flatpak uses for every app launch — battle-tested at scale
for "untrusted-input, short-lived, isolated."
Runner abstraction:
- `Runner.Run` signature: image string → ToolSpec{Image, Binary}.
Both fields populated by entry points; whichever engine is
installed reads the one it needs.
- `bwrapRunner` (new): assembles bwrap argv via `buildBwrapArgs`
helper (testable in isolation), spawns bwrap with the binary.
- `containerRunner` (renamed conceptually to "legacy fallback"):
unchanged behavior, still reachable for hosts that prefer OCI
containers per conversion.
Probe order in health.Probe: bwrap → podman → docker. First hit wins.
Engine kinds in Capabilities: "bwrap" | "podman" | "docker". The
no-engine error message now lists all three.
Config (cmd/zddc-server):
- new --convert-pandoc-binary / ZDDC_CONVERT_PANDOC_BINARY (default "pandoc")
- new --convert-chromium-binary / ZDDC_CONVERT_CHROMIUM_BINARY (default "chromium-browser")
- existing --convert-pandoc-image / --convert-chromium-image kept
for the OCI engine, doc updated to clarify they only apply there.
- --convert-engine helptext lists bwrap first.
Images:
- New `zddc/runtime.Containerfile` — alpine + bubblewrap + pandoc-cli +
chromium + font-noto. Documents build/publish workflow.
- helm/zddc-server-prod/values.yaml.example: runtimeImage default
switched to a placeholder for the new bundled runtime image; bare
alpine NO LONGER works for /.convert (clearly called out in the
comment).
- bitnest dev: /var/lib/zddc-dev-build/Containerfile mirrors the
production runtime image. Quadlet at /etc/containers/systemd/
zddc.container drops the podman-socket mount (no longer needed)
and sets ZDDC_CONVERT_ENGINE=bwrap explicitly to avoid silent
downgrades if a stray podman ends up on PATH.
Tests:
- convert_test.go: fakeRunner / recordingRunner now record ToolSpec.
- New TestToolSpecPopulation pins that both Image and Binary are
filled by every entry point.
- New TestBwrapArgs_SandboxFlagsPresent / MountTranslation /
RejectsBadMountSpec lock in the bwrap argv shape — a refactor that
drops a hardening flag or misroutes a mount fails this loud.
Docs:
- AGENTS.md § "Server-side document conversion" rewritten around
the bwrap-first model with podman/docker as legacy fallbacks.
- ARCHITECTURE.md convert reference updated.
- internal/convert package doc reflects the two-engine probe order.
Verified end-to-end on bitnest: probe reports
engine=bwrap pandoc_binary=pandoc chromium_binary=chromium-browser
on startup. All 15 Go test packages green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A focused security review of phases 1-4 surfaced one MEDIUM finding
(confidence 9/10): in client mode (--upstream set) the cache layer
forwards the configured bearer to upstream on every incoming request
without authenticating the local caller, AND --addr defaulted to
:8443 (all interfaces). Together those mean a CLI user running
`zddc-server --upstream https://master --bearer-file ~/token` on a
laptop on hotel/cafe Wi-Fi exposes an open-proxy confused-deputy:
any attacker on the same L2 connects to https://<laptop-ip>:8443,
accepts the self-signed cert, issues GETs (or PUTs/DELETEs that
queue in the outbox), and the cache laundries each request through
upstream with the engineer's bearer. The full cached subtree leaks.
Two layers of defense in config.Load:
1. Loopback default in client mode. When cfg.Upstream is set and
neither --addr nor ZDDC_ADDR was passed explicitly, --addr
downgrades to "127.0.0.1:8443" (vs ":8443" in master mode). CLI
users on a laptop get safe-by-default. Operators who want a
non-loopback bind opt in explicitly.
2. Refuse non-loopback bind + bearer-file without acknowledgement.
When cfg.Upstream is set, BearerFile is non-empty, the chosen
addr is non-loopback, AND --insecure-direct is not set, the load
fails with an error that names the bind, the threat (open-proxy
confused-deputy laundering bearer credentials), and the
acknowledgement flag. The helm zddc-server-cache/ chart already
sets ZDDC_INSECURE_DIRECT=1 and relies on Kubernetes-namespaced
pod networking for the gating, so the chart path is unaffected.
The guard is bearer-file-conditional because proxy mode without a
bearer doesn't have a credential to launder, and refusing it
would needlessly block proxy-without-auth deployments.
Tests in internal/config/config_test.go lock down all four cases:
- --upstream with no explicit --addr → 127.0.0.1:8443
- --upstream + non-loopback --addr + --bearer-file (no IDirect) → refuse
- --upstream + non-loopback --addr + --bearer-file + --insecure-direct → ok
- --upstream + non-loopback --addr + NO bearer → ok (no credential to leak)
Doc updates: zddc/README.md client-mode "Flags" section gets a
WARNING block describing the loopback default + insecure-direct
escape hatch. AGENTS.md ZDDC_UPSTREAM row mentions the addr
downgrade. ARCHITECTURE.md gains a "Confused-deputy guard at
startup" subsection under "Master + proxy/cache/mirror" with the
two-layer defense rationale. helm/zddc-server-cache/values.yaml.example
adds an inline note next to addr: ":8080" explaining why the chart
sets ZDDC_INSECURE_DIRECT=1 and what the consequence is of removing
either side of the gating.
Master mode is unaffected — the client-mode validation block is
gated by `if cfg.Upstream != ""`. All existing tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New chart helm/zddc-server-cache/ deploys zddc-server in client mode
against an upstream master. Mirrors the prod chart's source-build-via-
init-container pattern but with:
- ZDDC_UPSTREAM, ZDDC_MODE, ZDDC_BEARER_FILE, ZDDC_NO_AUTH,
ZDDC_SKIP_TLS_VERIFY, ZDDC_MIRROR_SUBTREE, ZDDC_MIRROR_MIN_INTERVAL
wired from values.yaml. Mirror-only env vars conditionally rendered
(only when mode=mirror) to keep the rendered manifest minimal.
- Bearer token mounted from a separately-created Kubernetes Secret
(defaultMode 0400) at /etc/zddc/bearer/token. values.yaml.example
documents the secret-creation flow but contains no token. Secret
reference can be set to "" to disable bearer auth (only valid for
upstreams running --no-auth).
- Recreate strategy + replicaCount: 1 (multiple replicas would race
the cache directory and double the upstream walker traffic).
- TCP-socket probes instead of HTTP — HTTP probes against / would
fail when both upstream is unreachable AND the cache is empty
(the cache layer returns 503 + offline header in that state),
causing crashloops. TCP verifies process liveness without depending
on upstream reachability or cache contents.
- Mounts a separate cache PVC (operator-provided, like the master's
data PVC). Sized to the working set you expect to mirror; can be
much smaller than the master's data volume.
Existing prod and dev charts gain optional ZDDC_NO_AUTH wired from
zddc.env.noAuth (default false → no change to existing rendered
manifests). Useful for trusted-LAN or genuinely-public master
deployments.
Updated docs: helm/README.md gains the cache row in the chart table,
the cache-install quickstart with the secret-creation flow, and the
cache-specific structural notes (Recreate / TCP probes / single-
instance). CLAUDE.md and ARCHITECTURE.md updated to reflect three
charts instead of two.
Verified with helm template rendering: ZDDC_NO_AUTH only renders
when noAuth: true; ZDDC_MIRROR_SUBTREE / ZDDC_MIRROR_MIN_INTERVAL
only render when mode: mirror; bearer volume + ZDDC_BEARER_FILE
only render when bearer.secretName is non-empty.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dev chart's overlay-isolation layer (added in 9765fa2) was not
called out in helm/README.md or zddc-server-dev/Chart.yaml. Readers
comparing the two charts saw "same shape but tracks main" without
learning that the dev chart wraps the data PVC in OverlayFS so its
writes never mutate the underlying store.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two interlocking pieces shipped together:
1. Strict Ed25519 signature verification on URL-fetched apps artifacts.
Every URL the apps cascade resolves must publish a corresponding
<url>.sig (raw 64-byte Ed25519 signature). The fetcher rejects on
any failure (sig 404, transport error, wrong key, tampered body)
and the resolver falls back to the embedded copy.
The trusted public key is OPERATOR-CONFIGURED via --apps-pubkey /
ZDDC_APPS_PUBKEY (PEM file path). No baked-in default — same posture
as TLS certificates. Operators using zddc.varasys.io's canonical
channels download pubkey.pem from there and configure the local
path. Operators with their own signing infrastructure pass their
own public key.
Build pipeline (./build) gains sign_release_artifacts: walks
dist/release-output/ after promote and produces an Ed25519 .sig
alongside every real file. ZDDC_SIGNING_KEY=~/.config/zddc-signing/
key.pem (mode 0600). Symlinks skip — the .sig at the symlink
target is what counts.
Test coverage: parse-PEM round-trip, malformed/wrong-type PEM
rejection, valid-signature accept, tampered-body reject, wrong-key
reject, malformed-signature reject, end-to-end fetch+sign+verify,
fetch-rejects-tampered, fetch-rejects-missing-sig, fetch-rejects-
wrong-key. Existing fetch tests updated to use signed-fixture
helpers.
2. Dev Helm chart mounts production data READ-ONLY and layers an
OverlayFS writable scratch on top. Prod data is the lowerdir;
dev's writes (form submissions, archive index state, .zddc edits)
land in upperdir; main container sees the merged read-write view
at $ZDDC_ROOT. Setup runs in a privileged init container; main
container runs unprivileged. Solves the dev-replica-on-shared-
dataset problem at the filesystem layer with no zddc-server code
change.
Docs: env-var tables in zddc/README.md and AGENTS.md gain a
ZDDC_APPS_PUBKEY row. The Federal-readiness gap analysis "Code-signed
apps: URL fetches" subsection is rewritten as "what's currently in
place" instead of "what would need to be added," with a forward
pointer to per-entry signed_by: (multi-key) and Sigstore as the
federally-acceptable evolution.
The website "Verify your downloads" section + the embedded pubkey
gone — but the website needs separate updates landing in zddc-website
to publish pubkey.pem and add the verify section. Pending in that
repo's commit.
Production binary unchanged at 13.1 MB. All 11 Go test packages green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two safe-by-default flips, both opt-out via explicit acknowledgement.
1. --insecure / ZDDC_INSECURE=1: zddc-server now refuses to start when
no <ZDDC_ROOT>/.zddc exists. With no .zddc anywhere in the chain,
AllowedWithChain falls through to "HasAnyFile=false → allow" and
the tree is publicly accessible to anonymous callers — almost never
what an operator wants on a fresh deployment, and previously a
silent footgun. The flag is the escape hatch for deliberately-
public archives (no .zddc anywhere by design).
2. ZDDC_CORS_ORIGIN now defaults to empty (CORS disabled) instead of
the canonical "https://zddc.varasys.io". The embedded-tools install
path serves tools and data same-origin, so the default never needed
to permit cross-origin XHRs from a third-party host. Every deployment
was implicitly trusting zddc.varasys.io to make authenticated XHRs
on behalf of every logged-in user; if that origin were ever
compromised, the blast radius extended to every customer server.
Operators who deliberately use the CDN-bootstrap pattern or self-
hosted tools at a different host now set the value explicitly.
Helm chart values updated accordingly: prod default is empty; dev
keeps localhost:8000 for tool-iteration workflows. Existing deployments
that depended on the old defaults will need to either set the value
explicitly or pass --insecure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two charts under helm/, both compile zddc-server from source via an
init container — no container image registry, no pre-built binary.
The init container clones the repo at a configured git ref, runs
`go build`, and writes the binary into a shared emptyDir; the main
container is alpine + the freshly built static binary.
helm/zddc-server-prod/ Production-shaped:
- gitRef pinned to a stable tag in
values.yaml.example (zddc-server-v0.0.7).
- imagePullPolicy IfNotPresent.
- Slower probe cadence (30s liveness, 10s
readiness).
- ZDDC_LOG_LEVEL=info.
- replicaCount: 1 (operators raise as needed
when backed by a shared filesystem).
helm/zddc-server-dev/ Dev/soak-shaped:
- gitRef defaults to "main" (rebuilt every pod
restart). build-time annotation forces
recreate on every helm upgrade.
- imagePullPolicy Always on the build image
so the latest golang:1.24-alpine is pulled.
- Faster probe cadence (10s liveness, 5s
readiness) — fail-fast in dev.
- ZDDC_LOG_LEVEL=debug. NOTE: debug logs every
request's full header map (includes auth
tokens / cookies) — this chart is for
private dev namespaces only.
- Strategy: Recreate (single replica racing
on different SHAs would be a mess).
Both charts:
- Wire the ZDDC_* env-var contract (ZDDC_ROOT, ZDDC_ADDR,
ZDDC_TLS_CERT=none, ZDDC_INSECURE_DIRECT=1, ZDDC_EMAIL_HEADER,
ZDDC_CORS_ORIGIN, ZDDC_LOG_LEVEL, ZDDC_INDEX_PATH).
- Mount a caller-supplied PVC at ZDDC_ROOT (chart does not create the
PVC; operators provision storage themselves).
- Optional Ingress (ingress.enabled: true). TLS is expected to be
terminated upstream of the pod; the pod listens on plain HTTP.
- No secrets in values.yaml.example. ACL email lists go in .zddc files
inside the data volume; image-pull and TLS secrets are referenced by
name only.
helm/README.md documents the design rationale (why build from source
instead of using a registry image), a quick-start example, and the
explicit list of what the charts do and don't do.
Note: `helm lint` cannot be run in this dev environment (helm isn't
installed). YAML syntax of Chart.yaml and values.yaml.example
verified via `python3 -c "yaml.safe_load(...)"`. Operators should
run `helm lint` and `helm template` before installing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>