refactor(convert): wrapper-in-image owns the sandbox; Go just exec's binaries

The bwrap engine + OCI engine that lived in internal/convert/runner.go
both leak isolation policy into Go code. Replaced with a single image-
side wrapper that drop-in-shadows pandoc and chromium-browser on PATH.
zddc-server's only contract with the image is now "exec.Command(name,
args) gets you that tool's behavior" — sandboxing, resource caps, and
namespace setup live entirely in shell scripts shipped by the image.

Architecture:
- zddc/runtime/zddc-cgroup-init runs at container start. cgroup v2's
  "no internal processes" constraint forbids a cgroup from having both
  children and processes; the init script moves PID 1 into a child,
  enables +memory +pids in subtree_control, then exec's zddc-server.
  Best-effort: degrades cleanly to "no resource caps" if cgroupfs
  isn't writable.
- zddc/runtime/zddc-sandbox-exec is the per-call wrapper, symlinked
  from /usr/local/bin/{pandoc,chromium-browser}. Creates a transient
  cgroup v2 (memory.max + pids.max), then bubblewrap-sandboxes the
  real binary at /usr/bin/<name>: --unshare-all, --ro-bind /usr,
  --proc /proc, --tmpfs /tmp, --clearenv. Caller's scratch dir comes
  in via ZDDC_SCRATCH env and is bind-mounted at the SAME path so
  absolute paths round-trip unchanged.

Go simplifications (~250 lines net deletion):
- Runner interface: Run(ctx, binary, stdin, scratchDir, cmd) — no
  ToolSpec, no mount list, no engine concept. Single localRunner
  implementation; bwrapRunner + containerRunner both deleted.
- health.Probe just looks up pandoc + chromium on PATH; Capabilities
  drops engine kinds.
- Convert.go: ToHTML/ToPDF write to a per-call scratch dir under
  TMPDIR and pass absolute paths; the wrapper bind-mounts the dir.
  No more "/tpl" / "/pdf" mount-point indirection.
- Config drops --convert-pandoc-image, --convert-chromium-image,
  --convert-engine, --convert-podman-socket (OCI engine gone) and
  --convert-cpus (CPU caps don't apply in the new model — wall-clock
  + memory + pids is the cap set). Defaults raised to match the new
  caps the user authorized: mem 512→1024 MiB, pids 100→256,
  timeout 30→60 s.

Image:
- zddc/runtime.Containerfile builds the production runtime image
  (alpine + bubblewrap + pandoc + chromium + font-noto). Two
  COPY statements pull in the wrapper scripts; ln -s symlinks the
  shadow names.
- bitnest dev image mirrors this layout under /var/lib/zddc-dev-build/.

Container privilege required:
- Nested bwrap needs the outer container to permit user + mount
  namespace creation + MS_SLAVE on root. The default seccomp +
  AppArmor profiles block all of these. Quadlet adds:
    --cap-add=ALL
    --security-opt=seccomp=unconfined
    --security-opt=apparmor=unconfined
    --security-opt=unmask=ALL
  Helm chart sets the equivalent via securityContext (capabilities.
  add: SYS_ADMIN, seccompProfile.type: Unconfined, appArmorProfile.
  type: Unconfined). Trade-off documented in AGENTS.md: zddc-server
  RCE now has near-root power within the container, but the bind-
  mount layout still bounds blast radius; bwrap is the real boundary
  between zddc-server and untrusted markdown.

Tests: convert_test.go fully rewritten for the new Runner signature.
Drops TestBwrapArgs_* (functionality moved out of Go) and
TestImageTag (no more image refs). All 15 Go test packages green.

Verified live on bitnest: pandoc --version round-trip exits 0
through the wrapper; MD→DOCX produces a valid Word 2007+ file
end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
ZDDC 2026-05-19 07:47:58 -05:00
parent 847e082e6e
commit cef7188a77
14 changed files with 691 additions and 1118 deletions

View file

@ -345,19 +345,30 @@ The markdown editor lives at `browse/js/preview-markdown.js` and is mounted as t
## Server-side document conversion (`zddc/internal/convert`)
zddc-server can convert `.md` → DOCX/HTML/PDF on demand at `GET /<path>/foo.md?convert=docx|html|pdf`. Implementation:
zddc-server can convert `.md` → DOCX/HTML/PDF on demand at `GET /<path>/foo.md?convert=docx|html|pdf`.
- **Two engines, probed bwrap → podman → docker.** The first one found on PATH wins; `--convert-engine=` / `ZDDC_CONVERT_ENGINE` forces a choice.
**Architecture.** zddc-server's Go code does the bare minimum: it `exec.Command("pandoc", args...)` or `exec.Command("chromium-browser", args...)`. **The sandbox + resource caps live in the IMAGE**, not in Go. In the production runtime image (`zddc/runtime.Containerfile`), `/usr/local/bin/pandoc` and `/usr/local/bin/chromium-browser` are symlinks to `zddc-sandbox-exec` — a shell wrapper that:
- **bwrap (production default).** Wraps `bubblewrap` to run `pandoc` and `chromium-browser` directly in a per-call Linux-namespace sandbox: `--unshare-all --unshare-user-try --die-with-parent --ro-bind /usr /usr ... --proc /proc --dev /dev --tmpfs /tmp --clearenv`. No daemon, no socket, no OCI image pull at conversion time. Binaries are baked into the zddc-server runtime image (`zddc/runtime.Containerfile`) so the operator just runs the image. Configure binary names via `--convert-pandoc-binary` (default `pandoc`) / `--convert-chromium-binary` (default `chromium-browser`; debian/ubuntu installs as `chromium`).
1. Creates a transient cgroup v2 (memory + pids cap from `ZDDC_CONV_MEM_MAX` / `ZDDC_CONV_PIDS_MAX` env), moves itself in.
2. Wraps the real binary at `/usr/bin/<name>` in a bubblewrap sandbox (`--unshare-all --unshare-user-try --die-with-parent --ro-bind /usr /usr ... --proc /proc --dev /dev --tmpfs /tmp --clearenv`).
3. exec's `/usr/bin/<name>` with the original argv.
- **podman / docker (legacy fallback).** Wraps `podman run` / `docker run` with `--rm --pull=missing --network=none --read-only --tmpfs=/tmp:size=256m,exec --memory --cpus --pids-limit --cap-drop=ALL --security-opt=no-new-privileges --env=HOME=/tmp`. Used when the operator wants OCI-image isolation per conversion and already has an engine on PATH. Default images `docker.io/pandoc/latex:latest` (override via `--convert-pandoc-image=` / `ZDDC_CONVERT_PANDOC_IMAGE`) and `docker.io/zenika/alpine-chrome:latest` (override via `--convert-chromium-image=`).
Why this shape: swapping isolation strategies (firejail, systemd-nspawn, podman-run, raw exec for dev) is purely an image concern. The Go code never changed. A separate `zddc-cgroup-init` script runs at container start to delegate cgroup v2 `subtree_control` (the "no internal processes" constraint), then exec's zddc-server. Both scripts live in `zddc/runtime/`.
- Resource caps via `--convert-mem-mib` (default 512), `--convert-cpus` (default "2"), `--convert-pids` (default 100), `--convert-timeout` (default 30s). bwrap stores them advisorily (no cgroup enforcement in this iteration); the OCI engine maps them to `--memory` / `--cpus` / `--pids-limit`.
- I/O via bind mount + stdin/stdout. Pandoc reads markdown from stdin, writes to stdout. The viewer template is bind-mounted read-only at `/tpl`. Chromium reads HTML from a read-write bind mount at `/pdf` and writes the PDF to the same mount; the host reads it back. Mount-spec syntax (`host:target[:ro|:rw]`) is identical across engines; the runner translates to `--ro-bind` / `--bind` (bwrap) or `--volume` (podman/docker).
**Outer-container privileges.** Nested bwrap needs the outer container to permit user + mount namespace creation. Pod Security Standards defaults block this. The helm chart sets `securityContext: capabilities.add: [SYS_ADMIN]`, `seccompProfile.type: Unconfined`, `appArmorProfile.type: Unconfined`. Trade-off: a zddc-server RCE has near-root power within the container's namespace, but the bind-mount layout (overlay fs, no host /home or /usr visible) still bounds the blast radius. The per-conversion bwrap sandbox is the real isolation boundary between zddc-server and untrusted pandoc/chromium.
**Config knobs** (all in `cmd/zddc-server`):
- `--convert-pandoc-binary` (default `pandoc`) / `--convert-chromium-binary` (default `chromium-browser`; `chromium` on debian)
- `--convert-scratch-dir` (default `$TMPDIR`) — host scratch root; the wrapper bind-mounts the per-call subdir
- `--convert-mem-mib` (default 1024) → wrapper's `memory.max`
- `--convert-pids` (default 256) → wrapper's `pids.max`
- `--convert-timeout` (default 60s) → enforced in Go via `context.WithTimeout`
**Other plumbing.**
- I/O via stdin/stdout + scratch dir. Pandoc reads markdown from stdin, writes to stdout. Templates + intermediate HTML + output PDF live in a per-call subdir under the scratch root; the dir's host path is passed to the child via `ZDDC_SCRATCH` so the wrapper bind-mounts it into the sandbox at the same path (no path translation).
- Output cached at `<dir>/.converted/<base>.<ext>` (hidden by the `.` prefix). mtime synced to source so the fast path is a stat-and-serve with no exec. PUT/DELETE/MOVE on the source `.md` purges the sidecars.
- Per-project template variables (client/project/contractor/project_number) come from `.zddc` `convert:` cascade keys. Title/tracking_number/revision/status are derived from the filename via `zddc.ParseFilename`.
- If no sandbox engine is found on PATH, the endpoint serves 503 with a Retry-After. The rest of the server keeps working.
- If pandoc/chromium aren't on PATH (operator running zddc-server outside the runtime image), the endpoint serves 503 with a Retry-After. The rest of the server keeps working. Operators who run zddc-server with raw pandoc/chromium (no wrapper) get a working but unsandboxed conversion endpoint — useful for dev iteration.
## Form-data system (`form/` + zddc-server form handler)

View file

@ -403,7 +403,7 @@ Files at the root level are ignored. The grouping folder list and transmittal fo
**Dependencies:** Toast UI Editor v3.2.2 (vendored at `shared/vendor/toastui-editor-all.min.js`, concatenated into `browse/dist/browse.html` at build time). No runtime CDN, no Tailwind.
**Server-mode features:** When the file handle is an `HttpFileHandle` (so `node.url` is set and `state.source === 'server'`), three Download buttons appear in the file header — DOCX/HTML/PDF — fetching `?convert=<fmt>` via `window.zddc.source.downloadConverted()`. Clicks auto-save first if the buffer is dirty so converted bytes reflect what's on screen. The server-side engine is in `zddc/internal/convert` — bwrap is the default sandbox (per-call Linux namespaces, no daemon, pandoc/chromium binaries baked into the runtime image), with podman/docker as legacy OCI-image fallbacks for hosts that already have a container engine.
**Server-mode features:** When the file handle is an `HttpFileHandle` (so `node.url` is set and `state.source === 'server'`), three Download buttons appear in the file header — DOCX/HTML/PDF — fetching `?convert=<fmt>` via `window.zddc.source.downloadConverted()`. Clicks auto-save first if the buffer is dirty so converted bytes reflect what's on screen. The server-side engine is in `zddc/internal/convert`: zddc-server `exec.Command`s `pandoc` and `chromium-browser` directly, and the runtime image's wrapper at `/usr/local/bin/<name>` (see `zddc/runtime.Containerfile` + `zddc/runtime/zddc-sandbox-exec`) handles the per-call cgroup v2 + bubblewrap sandbox between that exec and the real binary at `/usr/bin/<name>`. Isolation strategy lives entirely in the image; swap the wrapper for firejail / nspawn / podman-run and Go doesn't change.
---

View file

@ -64,7 +64,36 @@ spec:
- name: zddc-server
image: {{ printf "%s:%s" .Values.runtimeImage.repository .Values.runtimeImage.tag | quote }}
imagePullPolicy: IfNotPresent
command: ["/zddc/zddc-server"]
# zddc-cgroup-init prepares cgroup v2 subtree_control then
# exec's zddc-server. Required because cgroup v2 forbids
# processes in a cgroup that has child cgroups; the per-
# conversion wrapper (zddc-sandbox-exec) creates child
# cgroups for resource caps, so the init script has to
# move zddc-server itself out of the root cgroup first.
# See zddc/runtime/zddc-cgroup-init in the source repo.
command: ["/usr/local/libexec/zddc-cgroup-init", "/zddc/zddc-server"]
# The conversion sandbox (bwrap, invoked per-call by
# /usr/local/bin/{pandoc,chromium-browser}) needs to create
# user + mount namespaces inside the container. Pod Security
# Standards default policies forbid this; the chart sets the
# minimum securityContext that lets bwrap function. If your
# cluster's admission controller rejects these settings, you
# have two choices: ask the platform team to allow this pod,
# or accept that /.convert serves 503 (the rest of zddc-
# server still works fine without conversion).
securityContext:
capabilities:
add: ["SYS_ADMIN"]
# cap-add SYS_ADMIN alone isn't enough — see the
# zddc/runtime/zddc-sandbox-exec docstring for the full
# set of LSM relaxations required. K8s 1.30+ supports
# specifying seccompProfile + appArmorProfile fields;
# if your cluster is older, you'll need annotations:
# container.apparmor.security.beta.kubernetes.io/zddc-server: unconfined
seccompProfile:
type: Unconfined
appArmorProfile:
type: Unconfined
ports:
- name: http
containerPort: 8080

View file

@ -87,29 +87,24 @@ func main() {
"addr", cfg.Addr,
"embedded_apps", embeddedVersionsForLog(embedded))
// Probe the container runtime for the MD→{docx,html,pdf} endpoint.
// Non-fatal: if the host has no podman/docker (or the remote
// socket is unreachable in sidecar mode), conversion requests
// return 503 and everything else keeps working. The probe installs
// the package-level Runner when an engine is found; the configured
// Sandbox probe order is bwrap → podman → docker. The
// production-default bwrap engine reads the binary names below
// (pandoc + chromium are baked into the zddc-server image);
// the legacy OCI engines read the image refs and pull them
// lazily on first conversion via `--pull=missing`. The probe
// installs whichever runner the engine resolves to.
// Probe pandoc + chromium for the MD→{docx,html,pdf} endpoint.
// Non-fatal: if either binary isn't on PATH (operator running
// zddc-server outside the runtime image), conversion requests
// return 503 and everything else keeps working.
//
// SetRemoteURL + SetScratchDir must run BEFORE Probe so the
// OCI-engine path can hit the sidecar socket when one is
// configured; bwrap ignores both.
convert.SetImages(cfg.ConvertPandocImage, cfg.ConvertChromiumImage)
// In the production runtime image, "pandoc" and "chromium-browser"
// on PATH resolve to wrapper scripts at /usr/local/bin/<name>
// that put the real binary into a cgroup v2 + bwrap sandbox
// before exec'ing it. zddc-server is unaware — it just sees
// the corresponding tool's behavior. The wrapper reads
// ZDDC_CONV_MEM_MAX, ZDDC_CONV_PIDS_MAX, and ZDDC_SCRATCH from
// the child env to drive cgroup setup + scratch-dir bind mount.
convert.SetBinaries(cfg.ConvertPandocBinary, cfg.ConvertChromiumBinary)
convert.SetRemoteURL(cfg.ConvertPodmanSocket)
convert.SetScratchDir(cfg.ConvertScratchDir)
probeCtx, probeCancel := context.WithTimeout(context.Background(), 5*time.Second)
convert.Probe(probeCtx, cfg.ConvertEngine)
convert.Probe(probeCtx)
probeCancel()
convert.ConfigureLimits(cfg.ConvertMemMiB, cfg.ConvertCPUs, cfg.ConvertPIDs, cfg.ConvertTimeout)
convert.ConfigureLimits(cfg.ConvertMemMiB, cfg.ConvertPIDs, cfg.ConvertTimeout)
// Client mode short-circuit: when cfg.Upstream is set, this binary
// runs as a downstream proxy/cache/mirror rather than a master.

View file

@ -48,26 +48,18 @@ type Config struct {
ArchiveRescanInterval time.Duration // --archive-rescan-interval / ZDDC_ARCHIVE_RESCAN_INTERVAL — periodic full re-walk of the archive index. Covers SMB/CIFS where inotify misses cross-client writes. Default 60s; 0 to disable.
// MD→{docx,html,pdf} conversion endpoint (see internal/convert).
// The server shells out to upstream pandoc + chromium container
// images via podman or docker, pulling each on first use via
// production default. The engine probe order is bwrap → podman →
// docker; the first one found on PATH wins. bwrap runs the
// pandoc + chromium binaries baked into the zddc-server image
// in a per-call Linux-namespace sandbox (no daemon, no socket,
// no OCI image pull). podman/docker are legacy fallbacks for
// hosts that already have a container engine and want OCI-image
// isolation per conversion.
ConvertPandocImage string // --convert-pandoc-image / ZDDC_CONVERT_PANDOC_IMAGE — image for MD→DOCX/HTML when the OCI engine is selected. Default docker.io/pandoc/latex:latest.
ConvertChromiumImage string // --convert-chromium-image / ZDDC_CONVERT_CHROMIUM_IMAGE — image for HTML→PDF when the OCI engine is selected. Default docker.io/zenika/alpine-chrome:latest.
ConvertPandocBinary string // --convert-pandoc-binary / ZDDC_CONVERT_PANDOC_BINARY — pandoc binary name (PATH-resolved) when the bwrap engine is selected. Default "pandoc".
ConvertChromiumBinary string // --convert-chromium-binary / ZDDC_CONVERT_CHROMIUM_BINARY — chromium binary name (PATH-resolved) when the bwrap engine is selected. Default "chromium-browser" (alpine); set to "chromium" on debian.
ConvertEngine string // --convert-engine / ZDDC_CONVERT_ENGINE — override sandbox binary (default: probe for bwrap, then podman, then docker).
ConvertPodmanSocket string // --convert-podman-socket / ZDDC_CONVERT_PODMAN_SOCKET — when non-empty, run podman in remote mode against this Unix socket (e.g. unix:///var/run/podman/podman.sock). Used with the Kubernetes sidecar pattern so zddc-server's own pod stays unprivileged.
ConvertScratchDir string // --convert-scratch-dir / ZDDC_CONVERT_SCRATCH_DIR — directory used for per-conversion scratch (template + HTML/PDF intermediates). Must be a path the remote podman can see at the same path. Empty = use $TMPDIR (local-mode default).
ConvertMemMiB int // --convert-mem-mib / ZDDC_CONVERT_MEM_MIB — per-container memory cap in MiB. Default 512.
ConvertCPUs string // --convert-cpus / ZDDC_CONVERT_CPUS — per-container CPU limit. Default "2".
ConvertPIDs int // --convert-pids / ZDDC_CONVERT_PIDS — per-container PID limit. Default 100.
ConvertTimeout time.Duration // --convert-timeout / ZDDC_CONVERT_TIMEOUT — per-conversion wall clock. Default 30s.
// zddc-server exec's `pandoc` and `chromium-browser` directly.
// In the production runtime image those names resolve to wrapper
// scripts at /usr/local/bin/ that put the real binary into a
// cgroup v2 + bubblewrap sandbox before exec'ing it — see
// zddc/runtime.Containerfile + zddc/runtime/zddc-sandbox-exec.
// zddc-server is unaware of sandboxing; the image owns it.
ConvertPandocBinary string // --convert-pandoc-binary / ZDDC_CONVERT_PANDOC_BINARY — pandoc binary name (PATH-resolved) or absolute path. Default "pandoc". Resolves to the wrapper script in the runtime image.
ConvertChromiumBinary string // --convert-chromium-binary / ZDDC_CONVERT_CHROMIUM_BINARY — chromium binary name (PATH-resolved) or absolute path. Default "chromium-browser" (alpine); set to "chromium" on debian.
ConvertScratchDir string // --convert-scratch-dir / ZDDC_CONVERT_SCRATCH_DIR — directory used for per-conversion scratch (template + HTML/PDF intermediates). The wrapper bind-mounts this into the sandbox at the same path. Empty = use $TMPDIR.
ConvertMemMiB int // --convert-mem-mib / ZDDC_CONVERT_MEM_MIB — per-conversion memory cap in MiB (advisory; passed to the wrapper via ZDDC_CONV_MEM_MAX, applied as cgroup v2 memory.max). Default 1024.
ConvertPIDs int // --convert-pids / ZDDC_CONVERT_PIDS — per-conversion PID cap (passed to the wrapper via ZDDC_CONV_PIDS_MAX, applied as cgroup v2 pids.max). Default 256.
ConvertTimeout time.Duration // --convert-timeout / ZDDC_CONVERT_TIMEOUT — per-conversion wall clock (enforced in zddc-server via context.WithTimeout). Default 60s.
}
// ErrHelpRequested is returned by Load when --help is passed; the caller
@ -146,28 +138,18 @@ func Load(args []string) (Config, error) {
"Maximum PUT body size in bytes for the file API. Default 256 MiB. Larger requests are rejected with 413.")
archiveRescanIntervalFlag := fs.Duration("archive-rescan-interval", parseDurationOrDefault(os.Getenv("ZDDC_ARCHIVE_RESCAN_INTERVAL"), 60*time.Second),
"Periodic full re-walk of the archive index. Required on SMB/CIFS-backed roots where inotify misses cross-client writes. Default 60s; set 0 to disable.")
convertPandocImageFlag := fs.String("convert-pandoc-image", getEnv("ZDDC_CONVERT_PANDOC_IMAGE", "docker.io/pandoc/latex:latest"),
"Pandoc OCI image for MD→DOCX / MD→HTML, used only when the OCI engine (podman/docker) is selected. Pulled on first use via --pull=missing.")
convertChromiumImageFlag := fs.String("convert-chromium-image", getEnv("ZDDC_CONVERT_CHROMIUM_IMAGE", "docker.io/zenika/alpine-chrome:latest"),
"Chromium OCI image for HTML→PDF, used only when the OCI engine is selected. Pulled on first use via --pull=missing.")
convertPandocBinaryFlag := fs.String("convert-pandoc-binary", getEnv("ZDDC_CONVERT_PANDOC_BINARY", "pandoc"),
"Pandoc binary name (PATH-resolved) when the bwrap engine is selected. Default \"pandoc\".")
"Pandoc binary name (PATH-resolved) or absolute path. Default \"pandoc\". In the runtime image this resolves to the wrapper at /usr/local/bin/pandoc which sandboxes the real binary.")
convertChromiumBinaryFlag := fs.String("convert-chromium-binary", getEnv("ZDDC_CONVERT_CHROMIUM_BINARY", "chromium-browser"),
"Chromium binary name (PATH-resolved) when the bwrap engine is selected. Default \"chromium-browser\" (alpine); set to \"chromium\" on debian/ubuntu.")
convertEngineFlag := fs.String("convert-engine", os.Getenv("ZDDC_CONVERT_ENGINE"),
"Conversion sandbox override (default: probe for bwrap, then podman, then docker).")
convertPodmanSocketFlag := fs.String("convert-podman-socket", os.Getenv("ZDDC_CONVERT_PODMAN_SOCKET"),
"Run podman in remote mode against this Unix socket URL (e.g. unix:///var/run/podman/podman.sock). When set, the engine binary is invoked as `podman --remote --url=<this> run …`; the actual container creation happens in whatever process owns the socket (typically a podman-system-service sidecar). Empty = local mode.")
"Chromium binary name (PATH-resolved) or absolute path. Default \"chromium-browser\" (alpine); set to \"chromium\" on debian/ubuntu.")
convertScratchDirFlag := fs.String("convert-scratch-dir", os.Getenv("ZDDC_CONVERT_SCRATCH_DIR"),
"Scratch directory for per-conversion intermediates (template, HTML, PDF). In remote mode this MUST be a path that the podman-service side can see at the same path — typically a shared emptyDir mounted at the same mountPath in both containers. Empty = use $TMPDIR (local mode).")
convertMemMiBFlag := fs.Int("convert-mem-mib", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_MEM_MIB"), 512),
"Per-conversion container memory limit in MiB. Default 512.")
convertCPUsFlag := fs.String("convert-cpus", getEnv("ZDDC_CONVERT_CPUS", "2"),
"Per-conversion container CPU limit (passed to --cpus). Default 2.")
convertPIDsFlag := fs.Int("convert-pids", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_PIDS"), 100),
"Per-conversion container PID limit. Default 100.")
convertTimeoutFlag := fs.Duration("convert-timeout", parseDurationOrDefault(os.Getenv("ZDDC_CONVERT_TIMEOUT"), 30*time.Second),
"Per-conversion wall-clock timeout. Default 30s.")
"Scratch directory for per-conversion intermediates (template, HTML, PDF). The runtime image's wrapper bind-mounts this into the sandbox at the same path. Empty = use $TMPDIR.")
convertMemMiBFlag := fs.Int("convert-mem-mib", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_MEM_MIB"), 1024),
"Per-conversion memory limit in MiB (advisory; passed to the runtime-image wrapper via ZDDC_CONV_MEM_MAX, applied as cgroup v2 memory.max). Default 1024.")
convertPIDsFlag := fs.Int("convert-pids", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_PIDS"), 256),
"Per-conversion PID limit (passed to the runtime-image wrapper via ZDDC_CONV_PIDS_MAX, applied as cgroup v2 pids.max). Default 256.")
convertTimeoutFlag := fs.Duration("convert-timeout", parseDurationOrDefault(os.Getenv("ZDDC_CONVERT_TIMEOUT"), 60*time.Second),
"Per-conversion wall-clock timeout (enforced in zddc-server via context.WithTimeout). Default 60s.")
accessLogFlag := fs.String("access-log", os.Getenv("ZDDC_ACCESS_LOG"),
"Tee structured access logs to this file (JSON, size-rotated). "+
"Default: <ZDDC_ROOT>/.zddc.d/logs/access-<hostname>.log. "+
@ -239,15 +221,10 @@ func Load(args []string) (Config, error) {
AppsPubKey: *appsPubKeyFlag,
MaxWriteBytes: *maxWriteBytesFlag,
ArchiveRescanInterval: *archiveRescanIntervalFlag,
ConvertPandocImage: *convertPandocImageFlag,
ConvertChromiumImage: *convertChromiumImageFlag,
ConvertPandocBinary: *convertPandocBinaryFlag,
ConvertChromiumBinary: *convertChromiumBinaryFlag,
ConvertEngine: *convertEngineFlag,
ConvertPodmanSocket: *convertPodmanSocketFlag,
ConvertScratchDir: *convertScratchDirFlag,
ConvertMemMiB: *convertMemMiBFlag,
ConvertCPUs: *convertCPUsFlag,
ConvertPIDs: *convertPIDsFlag,
ConvertTimeout: *convertTimeoutFlag,
}

View file

@ -1,20 +1,15 @@
// Package convert turns a markdown source byte-buffer into DOCX, HTML,
// or PDF. Pandoc handles MD↔DOCX and MD→HTML; headless Chromium handles
// HTML→PDF. Each conversion runs inside an isolating sandbox so an
// untrusted source-markdown can't reach the host's filesystem or
// network even if it drives the binary to RCE.
// or PDF by exec'ing pandoc and chromium-browser. Each conversion runs
// inside a sandbox provided by the IMAGE — typically a wrapper script
// at /usr/local/bin/<binary> that puts the real binary into a cgroup
// v2 + bubblewrap sandbox before exec'ing it. See
// zddc/runtime.Containerfile for the production setup.
//
// Engine probe order (call Probe once at startup, first hit wins):
//
// 1. bwrap (production default). Runs the pandoc/chromium binaries
// baked into the zddc-server runtime image directly under
// bubblewrap: namespace-isolated, no network, read-only /usr, a
// 256 MiB tmpfs /tmp, minimal proc/dev. Configure binary names
// via SetBinaries; defaults are `pandoc` and `chromium-browser`.
// 2. podman / docker (legacy fallback). Runs each conversion inside
// an OCI container pulled lazily via `--pull=missing`. Defaults
// `docker.io/pandoc/latex:latest` + `docker.io/zenika/alpine-
// chrome:latest`; configure via SetImages.
// zddc-server's Go code is unaware of sandboxing: it just exec's
// "pandoc" or "chromium-browser" and gets the corresponding tool's
// behavior back. Operators who want a different isolation strategy
// (firejail, systemd-nspawn, podman-run, raw exec for dev) replace
// the wrapper script in their image; the Go binary doesn't change.
//
// Public surface:
//
@ -22,16 +17,13 @@
// ToHTML(ctx, source, meta) → []byte (standalone HTML)
// ToPDF (ctx, source, meta) → []byte (PDF, via HTML + chromium)
//
// Probe(ctx, override) → Capabilities (call once at startup)
// Probe(ctx) → Capabilities (call once at startup)
// Available() → (Capabilities, bool)
// SetImages(pandoc, chromium) — install OCI image refs from config
// SetBinaries(pandoc, chromium) — install bwrap binary names from config
// SetBinaries(pandoc, chromium) — install binary names from config
// SetScratchDir(dir) — install scratch root from config
//
// All three converters are safe for concurrent use; each call gets a
// fresh sandbox. The pandoc binary (or pandoc/latex image's entrypoint)
// reads pandoc flags directly; the chromium binary (or alpine-chrome
// image's entrypoint) reads chromium-browser flags. No `sh -c`
// wrappers, no shell quoting.
// fresh scratch dir + (image-provided) sandbox.
//
// Metadata maps to the placeholders consumed by viewer-template.html.
// title/tracking_number/revision/status/is_draft typically come from
@ -66,55 +58,33 @@ type Metadata struct {
NoTOC bool
}
// Default tool refs. The bwrap engine (default since v0.0.x) reads the
// Binary fields below; the legacy containerRunner reads the Image
// fields. The convert entry points populate both into a ToolSpec so
// whichever engine is installed picks the field it needs.
// Default binary names. The runtime image installs WRAPPER scripts at
// /usr/local/bin/pandoc and /usr/local/bin/chromium-browser (shadowing
// the real binaries in /usr/bin/) so these names resolve through the
// sandbox automatically. Operators running zddc-server outside the
// runtime image with raw binaries on PATH still get a working
// conversion endpoint — just without the per-call sandbox.
//
// pandoc/latex carries TeX Live for native PDF too, so the image is a
// superset of pandoc/core. The bwrap engine doesn't pay that cost —
// each binary is installed from the host's package manager (alpine:
// pandoc-cli + chromium) and the image grows by ≈ 200 MB once.
const (
DefaultPandocImage = "docker.io/pandoc/latex:latest"
DefaultChromiumImage = "docker.io/zenika/alpine-chrome:latest"
DefaultPandocBinary = "pandoc"
// Alpine's chromium package installs the binary as "chromium-browser".
// Debian/Ubuntu ships "chromium". Operators override via
// Alpine's chromium package installs the binary as "chromium-browser";
// debian/ubuntu ships "chromium". Operators override via
// --convert-chromium-binary when the package on their image differs.
const (
DefaultPandocBinary = "pandoc"
DefaultChromiumBinary = "chromium-browser"
)
var (
pandocImage atomic.Pointer[string]
chromiumImage atomic.Pointer[string]
pandocBinary atomic.Pointer[string]
chromiumBinary atomic.Pointer[string]
scratchDir atomic.Pointer[string]
)
// SetImages installs the OCI image refs used by the legacy
// containerRunner engine. The bwrap engine ignores these and reads
// the binary names installed via SetBinaries instead. Empty values
// keep the previous setting (or the DefaultPandocImage /
// DefaultChromiumImage constants on first call). Called from
// cmd/zddc-server/main.go after flag parsing.
func SetImages(pandoc, chromium string) {
if pandoc != "" {
s := pandoc
pandocImage.Store(&s)
}
if chromium != "" {
s := chromium
chromiumImage.Store(&s)
}
}
// SetBinaries installs the host-binary names used by the bwrap engine.
// Empty values keep the previous setting (or the DefaultPandocBinary /
// SetBinaries installs the binary names used by Probe/Run. Empty
// values keep the previous setting (or the DefaultPandocBinary /
// DefaultChromiumBinary constants on first call). The values are
// PATH-resolved names (e.g. "pandoc", "chromium-browser") or absolute
// paths. Called from cmd/zddc-server/main.go after flag parsing.
// PATH-resolved names (e.g. "pandoc", "chromium-browser") or
// absolute paths. Called from cmd/zddc-server/main.go after flag
// parsing.
func SetBinaries(pandoc, chromium string) {
if pandoc != "" {
s := pandoc
@ -126,12 +96,11 @@ func SetBinaries(pandoc, chromium string) {
}
}
// SetScratchDir installs the host-side scratch root used for per-call
// intermediates (template, HTML, PDF). Empty means "use $TMPDIR" — the
// local-mode default. In remote mode this MUST be a path the podman-
// service sidecar can see at the same mountpoint, typically a shared
// emptyDir mounted at /work in both containers. Called from
// cmd/zddc-server/main.go after flag parsing.
// SetScratchDir installs the host-side scratch root used for
// per-call intermediates (template, HTML, PDF). Empty means "use
// $TMPDIR". The runtime-image wrapper bind-mounts the per-call
// scratch dir into its sandbox at the same path, so any path under
// this root works.
func SetScratchDir(dir string) {
s := dir
scratchDir.Store(&s)
@ -144,20 +113,6 @@ func currentScratchDir() string {
return ""
}
func currentPandocImage() string {
if p := pandocImage.Load(); p != nil && *p != "" {
return *p
}
return DefaultPandocImage
}
func currentChromiumImage() string {
if p := chromiumImage.Load(); p != nil && *p != "" {
return *p
}
return DefaultChromiumImage
}
func currentPandocBinary() string {
if p := pandocBinary.Load(); p != nil && *p != "" {
return *p
@ -172,20 +127,10 @@ func currentChromiumBinary() string {
return DefaultChromiumBinary
}
// pandocTool / chromiumTool build the ToolSpec passed to Runner.Run.
// Both fields are populated so whichever engine is installed picks
// the one it needs (bwrap reads Binary; containerRunner reads Image).
func pandocTool() ToolSpec {
return ToolSpec{Image: currentPandocImage(), Binary: currentPandocBinary()}
}
func chromiumTool() ToolSpec {
return ToolSpec{Image: currentChromiumImage(), Binary: currentChromiumBinary()}
}
// ToDocx renders source markdown to DOCX bytes. One container run via
// the pandoc image. Caller passes the full file content (envelope +
// body); pandoc handles `markdown+yaml_metadata_block` natively.
// ToDocx renders source markdown to DOCX bytes. Single pandoc exec;
// no scratch dir needed (stdin → stdout). The caller passes the
// full file content (envelope + body); pandoc handles
// `markdown+yaml_metadata_block` natively.
func ToDocx(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
r := currentRunner()
if r == nil {
@ -198,13 +143,14 @@ func ToDocx(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
}
cmd = append(cmd, metadataArgs(m)...)
cmd = append(cmd, "-")
return r.Run(ctx, pandocTool(), source, nil, cmd)
return r.Run(ctx, currentPandocBinary(), source, "", cmd)
}
// ToHTML renders source markdown to standalone HTML using
// viewer-template.html. Embeds CSS + images via --embed-resources.
// Template + custom.css are bind-mounted into the container at /tpl
// from a per-call scratch dir.
// Template + custom.css live in a per-call scratch dir; the host
// path is passed via ZDDC_SCRATCH so the wrapper bind-mounts it
// into the sandbox at the same path.
func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
r := currentRunner()
if r == nil {
@ -216,6 +162,7 @@ func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
}
defer os.RemoveAll(scratch)
tplPath := filepath.Join(scratch, "viewer-template.html")
cmd := []string{
"--from=markdown+yaml_metadata_block",
"--to=html5",
@ -224,29 +171,27 @@ func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
"--section-divs",
"--id-prefix=",
"--html-q-tags",
"--template=/tpl/viewer-template.html",
"--template=" + tplPath,
}
if !m.NoTOC {
cmd = append(cmd, "--toc", "--toc-depth=6")
}
cmd = append(cmd, metadataArgs(m)...)
cmd = append(cmd, "--output=-", "-")
mounts := []string{scratch + ":/tpl:ro"}
return r.Run(ctx, pandocTool(), source, mounts, cmd)
return r.Run(ctx, currentPandocBinary(), source, scratch, cmd)
}
// ToPDF renders source markdown to PDF in two stages: pandoc produces
// HTML using viewer-template.html (stage 1, pandoc image), then headless
// Chromium prints that HTML to PDF (stage 2, chromium image). The
// two-stage choice preserves the print-media CSS already authored in
// viewer-template.html — pandoc's native --pdf-engine path uses LaTeX
// ToPDF renders source markdown to PDF in two stages: pandoc
// produces HTML using viewer-template.html (stage 1), then headless
// chromium prints that HTML to PDF (stage 2). The two-stage choice
// preserves the print-media CSS already authored in viewer-
// template.html — pandoc's native --pdf-engine path uses LaTeX
// which would bypass it entirely.
//
// Chromium runs from the alpine-chrome image whose entrypoint is
// `chromium-browser`; our cmd is the flag list passed straight to that
// binary. The host scratch dir is bind-mounted read-write at /pdf so
// chromium can write out.pdf and we read it back afterward.
// Both stages share a single per-call scratch dir: pandoc writes
// `in.html` and chromium reads it, then chromium writes `out.pdf`
// which the host reads back. The wrapper bind-mounts the scratch
// dir read-write into the sandbox at the same path.
func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
html, err := ToHTML(ctx, source, m)
if err != nil {
@ -271,17 +216,11 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
return nil, err
}
mounts := []string{scratch + ":/pdf:rw"}
// alpine-chrome's entrypoint is `chromium-browser`. --no-sandbox is
// required because the container drops CAP_SYS_ADMIN; the threat
// model is "malicious markdown drives chromium RCE", contained by
// --network=none + --cap-drop=ALL + --read-only + tmpfs.
//
// --disable-dev-shm-usage: without this, chromium tries to allocate
// shared memory under /dev/shm, which our --read-only container
// can't write to. The flag tells chromium to fall back to /tmp,
// which is a writable tmpfs (sized in runner.go). Standard fix for
// chromium-in-container; required by every CI/headless setup.
// --no-sandbox: the wrapper provides the sandbox; chromium's
// own setuid sandbox would conflict (and fails inside our
// user-namespace anyway). --disable-dev-shm-usage: chromium's
// shared-memory fallback writes to /dev/shm which our sandbox
// doesn't expose; redirect to /tmp (the wrapper's tmpfs).
cmd := []string{
"--headless",
"--disable-gpu",
@ -290,10 +229,10 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
"--user-data-dir=/tmp/chrome",
"--no-pdf-header-footer",
"--virtual-time-budget=10000",
"--print-to-pdf=/pdf/out.pdf",
"file:///pdf/in.html",
"--print-to-pdf=" + pdfPath,
"file://" + htmlPath,
}
if _, err := r.Run(ctx, chromiumTool(), nil, mounts, cmd); err != nil {
if _, err := r.Run(ctx, currentChromiumBinary(), nil, scratch, cmd); err != nil {
return nil, err
}
@ -303,7 +242,7 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
}
if len(out) < 4 || string(out[:4]) != "%PDF" {
return nil, &ConvertError{
Tool: "chromium",
Tool: currentChromiumBinary(),
ExitCode: 0,
Stderr: "chromium did not produce a valid PDF",
Cause: fmt.Errorf("invalid PDF magic in output (got %d bytes)", len(out)),
@ -312,9 +251,9 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
return out, nil
}
// metadataArgs renders Metadata into pandoc -V flags. Order is stable
// so test fixtures don't churn. Empty values are omitted (the template
// uses $if(...)$ blocks).
// metadataArgs renders Metadata into pandoc -V flags. Order is
// stable so test fixtures don't churn. Empty values are omitted
// (the template uses $if(...)$ blocks).
func metadataArgs(m Metadata) []string {
var out []string
add := func(k, v string) {

View file

@ -10,25 +10,25 @@ import (
)
// fakeRunner records the args it was invoked with and replays canned
// responses. Lets us assert the command lines + image refs without
// needing podman.
// responses. Lets us assert command lines + binary refs + scratch
// dirs without needing actual pandoc.
type fakeRunner struct {
mu sync.Mutex
calls [][]string
tools []ToolSpec
binaries []string
stdin [][]byte
mounts [][]string
scratchDir []string
resp []byte
err error
}
func (f *fakeRunner) Run(_ context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
func (f *fakeRunner) Run(_ context.Context, binary string, stdin []byte, scratchDir string, cmd []string) ([]byte, error) {
f.mu.Lock()
defer f.mu.Unlock()
f.calls = append(f.calls, append([]string(nil), cmd...))
f.tools = append(f.tools, tool)
f.binaries = append(f.binaries, binary)
f.stdin = append(f.stdin, append([]byte(nil), stdin...))
f.mounts = append(f.mounts, append([]string(nil), mounts...))
f.scratchDir = append(f.scratchDir, scratchDir)
return f.resp, f.err
}
@ -38,14 +38,14 @@ func (f *fakeRunner) lastCall() (string, []string) {
if len(f.calls) == 0 {
return "", nil
}
return f.tools[len(f.tools)-1].Image, f.calls[len(f.calls)-1]
return f.binaries[len(f.binaries)-1], f.calls[len(f.calls)-1]
}
func TestToDocx_UsesPandocImage(t *testing.T) {
func TestToDocx_UsesPandocBinary(t *testing.T) {
f := &fakeRunner{resp: []byte("FAKE-DOCX")}
InstallRunner(f)
t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:latest", "")
SetBinaries("pandoc", "chromium-browser")
out, err := ToDocx(context.Background(), []byte("# Hello\n"), Metadata{
Title: "Hello",
@ -57,9 +57,9 @@ func TestToDocx_UsesPandocImage(t *testing.T) {
if string(out) != "FAKE-DOCX" {
t.Errorf("unexpected output: %q", out)
}
image, call := f.lastCall()
if image != "docker.io/pandoc/latex:latest" {
t.Errorf("expected pandoc image, got %q", image)
binary, call := f.lastCall()
if binary != "pandoc" {
t.Errorf("expected pandoc binary, got %q", binary)
}
if !contains(call, "--to=docx") {
t.Errorf("missing --to=docx: %v", call)
@ -74,35 +74,40 @@ func TestToDocx_UsesPandocImage(t *testing.T) {
if call[len(call)-1] != "-" {
t.Errorf("expected stdin marker as last arg, got %q", call[len(call)-1])
}
// ToDocx is stdin → stdout — no scratch dir needed.
if f.scratchDir[len(f.scratchDir)-1] != "" {
t.Errorf("ToDocx should not need a scratch dir, got %q", f.scratchDir[len(f.scratchDir)-1])
}
}
func TestToHTML_UsesTemplateAndMountsScratch(t *testing.T) {
func TestToHTML_UsesTemplateFromScratchDir(t *testing.T) {
f := &fakeRunner{resp: []byte("<html>fake</html>")}
InstallRunner(f)
t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:latest", "")
SetBinaries("pandoc", "chromium-browser")
_, err := ToHTML(context.Background(), []byte("# Hi\n"), Metadata{Title: "Hi"})
if err != nil {
t.Fatalf("ToHTML: %v", err)
}
image, call := f.lastCall()
if image != "docker.io/pandoc/latex:latest" {
t.Errorf("expected pandoc image, got %q", image)
binary, call := f.lastCall()
if binary != "pandoc" {
t.Errorf("expected pandoc binary, got %q", binary)
}
if !contains(call, "--template=/tpl/viewer-template.html") {
t.Errorf("template flag missing: %v", call)
// Template flag must reference an absolute path under the scratch
// dir (no /tpl indirection anymore — the wrapper bind-mounts the
// scratch dir at its own path, so absolute host paths just work).
scratch := f.scratchDir[len(f.scratchDir)-1]
if scratch == "" {
t.Fatalf("ToHTML must pass a scratch dir to the runner")
}
wantTpl := "--template=" + scratch + "/viewer-template.html"
if !contains(call, wantTpl) {
t.Errorf("template flag missing/wrong; want %q in %v", wantTpl, call)
}
if !contains(call, "--toc") {
t.Errorf("TOC flag missing (default NoTOC=false): %v", call)
}
if len(f.mounts) == 0 || len(f.mounts[0]) == 0 {
t.Fatalf("expected at least one bind mount for /tpl")
}
mount := f.mounts[0][0]
if !strings.Contains(mount, ":/tpl:") {
t.Errorf("mount missing /tpl: %q", mount)
}
}
func TestToHTML_NoTOCSuppressesTOC(t *testing.T) {
@ -120,9 +125,9 @@ func TestToHTML_NoTOCSuppressesTOC(t *testing.T) {
}
}
// recordingRunner records every call and returns canned responses
// in sequence. Lets ToPDF tests assert the two-stage pipeline
// (pandoc image then chromium image).
// recordingRunner records every call and returns canned responses in
// sequence. Lets ToPDF tests assert the two-stage pipeline (pandoc
// then chromium).
type recordingRunner struct {
mu sync.Mutex
calls []recordedCall
@ -132,18 +137,18 @@ type recordingRunner struct {
}
type recordedCall struct {
image string
binary string
cmd []string
mounts []string
scratch string
}
func (r *recordingRunner) Run(_ context.Context, tool ToolSpec, _ []byte, mounts []string, cmd []string) ([]byte, error) {
func (r *recordingRunner) Run(_ context.Context, binary string, _ []byte, scratch string, cmd []string) ([]byte, error) {
r.mu.Lock()
defer r.mu.Unlock()
r.calls = append(r.calls, recordedCall{
image: tool.Image,
binary: binary,
cmd: append([]string(nil), cmd...),
mounts: append([]string(nil), mounts...),
scratch: scratch,
})
if r.cursor >= len(r.resp) {
return nil, nil
@ -169,57 +174,63 @@ func TestScratchDir_UsedByToHTML(t *testing.T) {
if err != nil {
t.Fatalf("ToHTML: %v", err)
}
if len(f.mounts) == 0 || len(f.mounts[0]) == 0 {
t.Fatalf("expected at least one mount")
if len(f.scratchDir) == 0 {
t.Fatalf("expected a scratch dir to be passed to the runner")
}
mount := f.mounts[0][0] // "<host>:/tpl:ro"
if !strings.HasPrefix(mount, scratchRoot+"/") {
t.Errorf("scratch dir not under configured root: %q (root=%q)", mount, scratchRoot)
got := f.scratchDir[0]
if !strings.HasPrefix(got, scratchRoot+"/") {
t.Errorf("scratch dir not under configured root: %q (root=%q)", got, scratchRoot)
}
}
func TestToPDF_TwoStagePipeline(t *testing.T) {
// Stage 1: pandoc emits HTML. Stage 2: chromium reads HTML from
// the bind mount and writes /pdf/out.pdf. The fake runner can't
// the scratch dir and writes out.pdf there. The fake runner can't
// actually write the PDF, so we expect ToPDF to fail at the
// read-back step — but we can still assert the two-stage call
// shape and the right image per stage.
// shape and the right binary per stage.
r := &recordingRunner{
resp: [][]byte{
[]byte("<html><body>fake</body></html>"), // stage 1 stdout
nil, // stage 2 stdout (chromium writes PDF to bind mount)
nil, // stage 2 stdout (chromium writes PDF to scratch)
},
}
InstallRunner(r)
t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:latest", "docker.io/zenika/alpine-chrome:latest")
SetBinaries("pandoc", "chromium-browser")
_, err := ToPDF(context.Background(), []byte("# Hi\n"), Metadata{})
// PDF read-back will fail (fake runner didn't write the file) —
// that's expected for this test which only inspects the call
// shape.
// that's expected for this test which only inspects the call shape.
if err == nil {
t.Fatalf("expected error from PDF read-back; got nil")
}
if len(r.calls) != 2 {
t.Fatalf("expected 2 container calls (pandoc + chromium); got %d", len(r.calls))
t.Fatalf("expected 2 calls (pandoc + chromium); got %d", len(r.calls))
}
if r.calls[0].image != "docker.io/pandoc/latex:latest" {
t.Errorf("stage 1 image: got %q want pandoc/latex", r.calls[0].image)
if r.calls[0].binary != "pandoc" {
t.Errorf("stage 1 binary: got %q want pandoc", r.calls[0].binary)
}
if r.calls[1].image != "docker.io/zenika/alpine-chrome:latest" {
t.Errorf("stage 2 image: got %q want alpine-chrome", r.calls[1].image)
if r.calls[1].binary != "chromium-browser" {
t.Errorf("stage 2 binary: got %q want chromium-browser", r.calls[1].binary)
}
// Stage 2 must include the --print-to-pdf flag pointing at /pdf.
if !contains(r.calls[1].cmd, "--print-to-pdf=/pdf/out.pdf") {
t.Errorf("chromium call missing --print-to-pdf flag: %v", r.calls[1].cmd)
// Stage 2 must include --print-to-pdf pointing at an absolute
// path under the scratch dir.
stage2 := r.calls[1]
if stage2.scratch == "" {
t.Fatalf("chromium call must have a scratch dir")
}
if !contains(r.calls[1].cmd, "--no-sandbox") {
t.Errorf("chromium call missing --no-sandbox: %v", r.calls[1].cmd)
wantPDF := "--print-to-pdf=" + stage2.scratch + "/out.pdf"
if !contains(stage2.cmd, wantPDF) {
t.Errorf("chromium call missing --print-to-pdf=%s/out.pdf: %v", stage2.scratch, stage2.cmd)
}
// Stage 2's bind mount must be writable (chromium writes the PDF).
if len(r.calls[1].mounts) == 0 || !strings.Contains(r.calls[1].mounts[0], ":rw") {
t.Errorf("chromium mount must be :rw, got %v", r.calls[1].mounts)
if !contains(stage2.cmd, "--no-sandbox") {
t.Errorf("chromium call missing --no-sandbox: %v", stage2.cmd)
}
// Stage 2 chromium reads file://<scratch>/in.html.
wantHTML := "file://" + stage2.scratch + "/in.html"
if !contains(stage2.cmd, wantHTML) {
t.Errorf("chromium call missing file:// URL: %v", stage2.cmd)
}
}
@ -255,21 +266,6 @@ func TestMetadataArgs_OmitsEmptyAndOrdersStably(t *testing.T) {
}
}
func TestImageTag(t *testing.T) {
cases := map[string]string{
"docker.io/pandoc/latex:latest": "pandoc/latex",
"docker.io/zenika/alpine-chrome:latest": "zenika/alpine-chrome",
"pandoc/core": "pandoc/core",
"quay.io/example/foo:v1": "example/foo",
"alpine": "alpine",
}
for in, want := range cases {
if got := imageTag(in); got != want {
t.Errorf("imageTag(%q) = %q, want %q", in, got, want)
}
}
}
func TestSingleflight_Collapses(t *testing.T) {
var g singleflightGroup
const N = 50
@ -305,113 +301,3 @@ func contains(haystack []string, needle string) bool {
}
return false
}
// TestToolSpecPopulation: the convert entry points populate BOTH the
// Image and Binary fields of ToolSpec, so the runner-of-the-day can
// pick whichever it needs. bwrapRunner reads Binary; containerRunner
// reads Image; the call site doesn't know which is installed.
func TestToolSpecPopulation(t *testing.T) {
f := &fakeRunner{resp: []byte("ok")}
InstallRunner(f)
t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:1.0", "docker.io/zenika/alpine-chrome:2.0")
SetBinaries("/opt/bin/pandoc", "/opt/bin/chromium")
t.Cleanup(func() { SetImages("", ""); SetBinaries("", "") })
if _, err := ToDocx(context.Background(), []byte("# x\n"), Metadata{}); err != nil {
t.Fatalf("ToDocx: %v", err)
}
if len(f.tools) != 1 {
t.Fatalf("want 1 tool call, got %d", len(f.tools))
}
got := f.tools[0]
if got.Image != "docker.io/pandoc/latex:1.0" {
t.Errorf("Image = %q, want docker.io/pandoc/latex:1.0", got.Image)
}
if got.Binary != "/opt/bin/pandoc" {
t.Errorf("Binary = %q, want /opt/bin/pandoc", got.Binary)
}
}
// TestBwrapArgs_SandboxFlagsPresent locks in the bwrap argv shape.
// Every conversion must run with these hardening flags — the whole
// point of bwrap-as-default is that the sandbox is built into every
// invocation. A refactor that drops any of them needs to fail this
// test loudly.
func TestBwrapArgs_SandboxFlagsPresent(t *testing.T) {
args, err := buildBwrapArgs("pandoc", nil, []string{"--from=markdown", "--to=docx", "-"})
if err != nil {
t.Fatalf("buildBwrapArgs: %v", err)
}
mustHave := []string{
"--unshare-all", // net + pid + ipc + uts + cgroup
"--unshare-user-try", // user-namespace when kernel allows
"--die-with-parent", // cleanup when zddc-server exits
"--proc", // minimal /proc
"--dev", // minimal /dev
"--tmpfs", // writable /tmp scratch
"--clearenv", // no host env leaks
}
for _, flag := range mustHave {
if !contains(args, flag) {
t.Errorf("bwrap args missing sandbox flag %q: %v", flag, args)
}
}
// /usr must be bind-mounted read-only — that's how the binary
// + its dynamic libs are visible inside the sandbox. The
// "--ro-bind /usr /usr" triple must appear consecutively.
if i := indexOfTriple(args, "--ro-bind", "/usr", "/usr"); i < 0 {
t.Errorf("bwrap args missing --ro-bind /usr /usr: %v", args)
}
// Binary + caller-cmd come last, in order.
last := args[len(args)-4:]
want := []string{"pandoc", "--from=markdown", "--to=docx", "-"}
for i, w := range want {
if last[i] != w {
t.Errorf("trailing args[%d] = %q, want %q", i, last[i], w)
}
}
}
// TestBwrapArgs_MountTranslation: caller "host:target:ro" → bwrap
// "--ro-bind host target"; "host:target:rw" → "--bind host target";
// no mode segment defaults to ro (mirroring containerRunner).
func TestBwrapArgs_MountTranslation(t *testing.T) {
args, err := buildBwrapArgs("pandoc",
[]string{"/host/tpl:/tpl:ro", "/host/pdf:/pdf:rw", "/host/x:/x"},
nil)
if err != nil {
t.Fatalf("buildBwrapArgs: %v", err)
}
if i := indexOfTriple(args, "--ro-bind", "/host/tpl", "/tpl"); i < 0 {
t.Errorf("missing --ro-bind /host/tpl /tpl: %v", args)
}
if i := indexOfTriple(args, "--bind", "/host/pdf", "/pdf"); i < 0 {
t.Errorf("missing --bind /host/pdf /pdf: %v", args)
}
if i := indexOfTriple(args, "--ro-bind", "/host/x", "/x"); i < 0 {
t.Errorf("missing default-ro --ro-bind /host/x /x: %v", args)
}
}
// TestBwrapArgs_RejectsBadMountSpec: a malformed mount string fails
// fast, never reaches exec. Single-segment specs (no target) and
// unknown modes both qualify.
func TestBwrapArgs_RejectsBadMountSpec(t *testing.T) {
for _, bad := range []string{"only-host", "/h:/t:weird", ""} {
if _, err := buildBwrapArgs("pandoc", []string{bad}, nil); err == nil {
t.Errorf("expected error for malformed mount %q", bad)
}
}
}
// indexOfTriple returns the index of `a` in args such that
// args[i:i+3] == {a, b, c}, or -1.
func indexOfTriple(args []string, a, b, c string) int {
for i := 0; i+2 < len(args); i++ {
if args[i] == a && args[i+1] == b && args[i+2] == c {
return i
}
}
return -1
}

View file

@ -11,51 +11,45 @@ import (
"time"
)
// remoteURL is set by Probe from cfg.ConvertPodmanSocket. Empty means
// local mode.
var remoteURL atomic.Pointer[string]
// Capabilities is the snapshot of "can we convert right now?". The
// only hard requirement is a container runtime reachable from
// zddc-server — image presence is left to `--pull=missing` at
// conversion time, so a missing image surfaces as a normal
// ConvertError (not a probe failure).
// Capabilities is the snapshot the convert-health endpoint reports
// and the convert entry points consult before exec'ing.
//
// Mode applies to OCI engines (podman/docker): "local" when the
// engine creates containers in the same process as zddc-server,
// "remote" when zddc-server is the client of a podman-system-service
// sidecar. The bwrap engine has no mode (always direct exec).
// In the runtime-image model, "Ready" means both binaries
// (pandoc + chromium) are present on PATH. Sandboxing + resource
// limits live in the wrapper scripts that PATH resolves to — out
// of zddc-server's concern. The probe doesn't try to validate
// those; if the wrapper is broken, the first conversion surfaces
// the failure as a ConvertError with the wrapper's stderr.
type Capabilities struct {
Engine string // "bwrap" | "podman" | "docker" | ""
EngineVer string // first line of "<engine> --version"
Mode string // "local" or "remote" (OCI engines only)
RemoteURL string // populated in remote mode (OCI engines only)
PandocImage string // resolved pandoc image ref (OCI engines)
ChromiumImage string // resolved chromium image ref (OCI engines)
PandocBinary string // resolved path, e.g. /usr/local/bin/pandoc
PandocVersion string // first line of "pandoc --version"
ChromiumBinary string // resolved path, e.g. /usr/local/bin/chromium-browser
ChromiumVersion string // first line of "chromium-browser --version"
ProbedAt time.Time
Err error
}
// Ready reports whether conversions can be attempted. The first
// conversion may still fail if the configured binary or image isn't
// actually present (the runner will surface a clear error from the
// child process's stderr).
// Ready reports whether conversions can be attempted.
func (c Capabilities) Ready() bool {
return c.Engine != "" && c.Err == nil
return c.PandocBinary != "" && c.ChromiumBinary != "" && c.Err == nil
}
// Reason returns a short human-friendly explanation when Ready() is
// false. Used as the body of a 503.
func (c Capabilities) Reason() string {
if c.Engine == "" {
return "no conversion sandbox found (looked for bwrap, podman, docker on PATH)"
}
if c.Err != nil {
if c.Mode == "remote" {
return fmt.Sprintf("podman remote socket unreachable (%s): %s", c.RemoteURL, c.Err.Error())
}
return c.Err.Error()
}
var missing []string
if c.PandocBinary == "" {
missing = append(missing, "pandoc")
}
if c.ChromiumBinary == "" {
missing = append(missing, "chromium-browser")
}
if len(missing) > 0 {
return fmt.Sprintf("conversion binary not found on PATH: %s — runtime image is missing the conversion toolchain (see zddc/runtime.Containerfile)", strings.Join(missing, ", "))
}
return "unavailable"
}
@ -74,187 +68,75 @@ func Available() (Capabilities, bool) {
return *p, p.Ready()
}
// SetRemoteURL installs the podman remote socket URL for subsequent
// Probe / Reprobe calls. Empty means "local mode" (the engine binary
// creates containers in the same process). Called from
// cmd/zddc-server/main.go after flag parsing, before Probe.
func SetRemoteURL(url string) {
s := url
remoteURL.Store(&s)
}
func currentRemoteURL() string {
if p := remoteURL.Load(); p != nil {
return *p
}
return ""
}
// Probe locates the container engine and installs a containerRunner
// as the package default. Call once at server startup. Returns the
// captured Capabilities for logging.
// Probe resolves the conversion binaries on PATH and installs the
// localRunner. Call once at server startup. Returns the captured
// Capabilities for logging.
//
// Engine order: engineOverride (if non-empty) → podman → docker. First
// hit wins. Image presence is NOT probed: the runner uses
// `--pull=missing` so the first conversion request will pull whichever
// image it needs.
// Image responsibility: the binaries on PATH should be the wrapper
// scripts at /usr/local/bin/{pandoc,chromium-browser} (shipped by
// zddc/runtime.Containerfile). Each wrapper handles cgroup setup
// + bwrap sandbox + exec of the real binary at /usr/bin/<name>.
// If an operator runs zddc-server outside the runtime image with
// raw pandoc / chromium on PATH, the conversion still works but
// without the per-call sandbox + resource caps.
//
// In remote mode (SetRemoteURL with non-empty URL), the probe also
// invokes `<engine> --remote --url=<url> version` to confirm the
// sidecar's socket is reachable. A reachable-engine-but-unreachable-
// socket state surfaces as Ready=false so conversion requests serve
// 503 until the sidecar comes up.
//
// Any failure here is non-fatal: the server still starts, conversion
// Failure here is non-fatal: the server still starts, conversion
// endpoints just return 503.
func Probe(ctx context.Context, engineOverride string) Capabilities {
func Probe(ctx context.Context) Capabilities {
probeCool.Lock()
defer probeCool.Unlock()
now := time.Now()
rURL := currentRemoteURL()
c := Capabilities{
PandocImage: currentPandocImage(),
ChromiumImage: currentChromiumImage(),
Mode: "local",
RemoteURL: rURL,
ProbedAt: now,
c := Capabilities{ProbedAt: time.Now()}
pandocBin := currentPandocBinary()
chromiumBin := currentChromiumBinary()
if p, err := exec.LookPath(pandocBin); err == nil {
c.PandocBinary = p
if v, err := probeVersion(ctx, p); err == nil {
c.PandocVersion = v
}
}
if p, err := exec.LookPath(chromiumBin); err == nil {
c.ChromiumBinary = p
if v, err := probeVersion(ctx, p); err == nil {
c.ChromiumVersion = v
}
if rURL != "" {
c.Mode = "remote"
}
enginePath := resolveEngine(engineOverride)
if enginePath == "" {
c.Err = fmt.Errorf("no conversion sandbox found (tried: %s)", strings.Join(enginesTried(engineOverride), ", "))
if c.PandocBinary == "" || c.ChromiumBinary == "" {
c.Err = fmt.Errorf("%s", c.Reason())
caps.Store(&c)
slog.Warn("convert: probe failed", "reason", c.Err.Error())
return c
}
kind := engineKind(enginePath)
c.Engine = kind
if v, err := probeVersion(ctx, enginePath); err == nil {
c.EngineVer = v
}
// bwrap engine: no remote-mode concept, just install the runner.
// The bwrap binary IS the sandbox; conversion binaries (pandoc,
// chromium) are resolved separately from PATH at call time and
// reported by the convert-health endpoint when ready.
if kind == "bwrap" {
InstallRunner(newBwrapRunner(enginePath))
InstallRunner(newLocalRunner())
caps.Store(&c)
slog.Info("convert: ready",
"engine", kind,
"engine_path", enginePath,
"engine_version", c.EngineVer,
"pandoc_binary", currentPandocBinary(),
"chromium_binary", currentChromiumBinary())
"pandoc_binary", c.PandocBinary,
"pandoc_version", c.PandocVersion,
"chromium_binary", c.ChromiumBinary,
"chromium_version", c.ChromiumVersion)
return c
}
// Legacy OCI engine (podman/docker). Optional remote-socket
// connectivity check, then install containerRunner.
if rURL != "" {
if err := probeRemoteSocket(ctx, enginePath, rURL); err != nil {
c.Err = err
caps.Store(&c)
slog.Warn("convert: remote socket probe failed",
"engine", kind, "remote_url", rURL, "err", err)
return c
}
}
InstallRunner(newContainerRunner(enginePath, rURL))
caps.Store(&c)
slog.Info("convert: ready",
"engine", kind,
"engine_path", enginePath,
"engine_version", c.EngineVer,
"mode", c.Mode,
"remote_url", c.RemoteURL,
"pandoc_image", c.PandocImage,
"chromium_image", c.ChromiumImage)
return c
}
// probeRemoteSocket runs `<engine> --remote --url=<url> version` with
// a short timeout. Returns nil on success; a wrapped error otherwise.
// The remote URL is typically a Unix socket path
// (unix:///var/run/podman/podman.sock) in the sidecar pattern but a
// TCP form (tcp://host:port) is accepted too.
func probeRemoteSocket(ctx context.Context, engine, url string) error {
c := exec.CommandContext(ctx, engine, "--remote", "--url="+url, "version", "--format={{.Client.Version}}")
out, err := c.CombinedOutput()
if err != nil {
return fmt.Errorf("podman --remote version: %w (output: %s)", err, strings.TrimSpace(string(out)))
}
return nil
}
// Reprobe re-runs Probe with the existing configuration. Used by the
// handler when a request hits a not-Ready state — gives the operator
// a way to recover (e.g. installed podman after the server started)
// without a server restart. Cooldown of 60 s between probes to keep
// error-path requests cheap.
func Reprobe(ctx context.Context, engineOverride string) Capabilities {
// Reprobe re-runs Probe with the existing configuration. Used by
// the handler when a request hits a not-Ready state — gives the
// operator a way to recover (e.g. installed pandoc after server
// start) without a server restart. Cooldown of 60 s between probes
// to keep error-path requests cheap.
func Reprobe(ctx context.Context) Capabilities {
if p := caps.Load(); p != nil {
if time.Since(p.ProbedAt) < 60*time.Second {
return *p
}
}
return Probe(ctx, engineOverride)
return Probe(ctx)
}
func resolveEngine(override string) string {
if override != "" {
if p, err := exec.LookPath(override); err == nil {
return p
}
return ""
}
// Probe order: bwrap (production default — lightest sandbox, no
// daemon, no OCI engine), then podman / docker as legacy fallbacks
// for hosts that already have a container engine and want OCI-image
// isolation per conversion.
for _, name := range []string{"bwrap", "podman", "docker"} {
if p, err := exec.LookPath(name); err == nil {
return p
}
}
return ""
}
func enginesTried(override string) []string {
if override != "" {
return []string{override}
}
return []string{"bwrap", "podman", "docker"}
}
// engineKind returns the engine-family label for a resolved binary
// path. "bwrap" is its own engine; "podman" and "docker" are the
// OCI-container engines handled by containerRunner. Used by Probe to
// pick the right Runner implementation.
func engineKind(resolved string) string {
base := resolved
if i := strings.LastIndex(base, "/"); i >= 0 {
base = base[i+1:]
}
switch base {
case "bwrap":
return "bwrap"
case "podman", "podman-remote":
return "podman"
case "docker":
return "docker"
}
return base
}
func probeVersion(ctx context.Context, engine string) (string, error) {
c := exec.CommandContext(ctx, engine, "--version")
func probeVersion(ctx context.Context, binary string) (string, error) {
c := exec.CommandContext(ctx, binary, "--version")
out, err := c.CombinedOutput()
if err != nil {
return "", err

View file

@ -10,60 +10,45 @@ import (
"os"
"os/exec"
"path/filepath"
"strings"
"sync"
"time"
)
// ToolSpec identifies the conversion tool to invoke. Runners pick
// whichever field applies to them:
// Runner executes a conversion binary and returns its stdout. The
// production implementation (localRunner) just exec's the binary
// directly. Tests use a fake.
//
// - bwrapRunner uses Binary — the path or PATH-name of the tool on
// the zddc-server host (or container). pandoc/latex's entrypoint
// becomes `pandoc`; alpine-chrome's becomes `chromium-browser`.
// This is the production-default engine: lightest sandbox, no
// daemon, no privileged outer container.
// binary is the PATH-resolvable name (or absolute path) of the
// conversion tool — typically "pandoc" or "chromium-browser". In the
// production runtime image those names resolve to wrapper scripts at
// /usr/local/bin/ that put the real binary into a cgroup + bwrap
// sandbox before exec'ing it. From zddc-server's perspective, that
// indirection is invisible: it just sees pandoc behavior.
//
// - containerRunner uses Image — the OCI image ref pulled into a
// fresh container for each conversion (legacy/fallback engine,
// kept for environments that already host a podman/docker daemon
// and want OCI-image isolation per conversion).
// stdin is piped to the binary's stdin. scratchDir is an optional
// host directory the binary needs to read from / write to (template
// + intermediate HTML + PDF output); passed to the child via the
// ZDDC_SCRATCH env var, which the wrapper script bind-mounts into
// the sandbox at the same path. Empty means "no scratch dir
// needed" (DOCX flow — stdin to stdout, no files).
//
// Both fields are populated by the entry points in convert.go so a
// single call site works regardless of which engine is installed.
type ToolSpec struct {
Image string // OCI image ref (containerRunner)
Binary string // binary name on PATH (bwrapRunner)
}
// Runner executes a conversion sub-process and returns its stdout.
// The host-side implementations are bwrapRunner (default; wraps
// `bubblewrap`) and containerRunner (fallback; wraps `podman run` /
// `docker run`). Tests use a fake.
// cmd is the argv passed to the binary. Same shape across all
// runners; no shell quoting; no engine-specific flags.
//
// stdin is piped to the tool's stdin. cmd is the argv passed *to the
// tool* — for pandoc the entrypoint accepts pandoc flags directly;
// for chromium it accepts chromium-browser flags. mounts is a list
// of "<hostPath>:<targetPath>" specs (":ro" is added if no mode
// segment is present); each runner translates them to its own
// bind/--volume syntax.
//
// All exec calls in this package go through Runner.Run. This is the
// first os/exec site in the codebase; the hardening here is the
// pattern for future shell-outs.
// All exec calls in this package go through Runner.Run.
type Runner interface {
Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error)
Run(ctx context.Context, binary string, stdin []byte, scratchDir string, cmd []string) ([]byte, error)
}
// ErrUnavailable means no container runtime is present on the host.
// Handlers translate to HTTP 503.
// ErrUnavailable means the conversion binary couldn't be found on
// PATH. Handlers translate to HTTP 503.
var ErrUnavailable = errors.New("conversion unavailable")
// ConvertError carries the failure surface from a non-zero exit.
// Stderr is captured (truncated to 4 KiB by the runner) so callers can
// surface pandoc/chromium's own complaint.
// Stderr is captured (truncated to 4 KiB by the runner) so callers
// can surface the binary's own complaint.
type ConvertError struct {
Tool string // image name fragment, used only for logging
Tool string // binary name, used only for logging
ExitCode int
Stderr string
Cause error
@ -74,78 +59,154 @@ func (e *ConvertError) Error() string {
return "<nil>"
}
if e.Stderr != "" {
return fmt.Sprintf("%s exit %d: %s", e.Tool, e.ExitCode, strings.TrimSpace(e.Stderr))
return fmt.Sprintf("%s exit %d: %s", e.Tool, e.ExitCode, e.Stderr)
}
return fmt.Sprintf("%s exit %d: %v", e.Tool, e.ExitCode, e.Cause)
}
func (e *ConvertError) Unwrap() error { return e.Cause }
// containerRunner runs each conversion inside a fresh container.
// The engine ("podman" preferred, "docker" fallback) is resolved once
// at startup by Probe. Resource limits are configurable via
// SetLimits (called from main.go after flag parsing). Images are passed
// per call so the same runner handles both pandoc and chromium
// invocations.
// localRunner exec's the conversion binary directly. The runtime
// image's wrapper script (at /usr/local/bin/<binary>) handles
// sandboxing + resource limits BETWEEN this exec and the real
// binary — invisible to this Runner.
//
// Two modes:
//
// - **local** (remoteURL=""): the engine binary creates containers
// directly on the host that runs zddc-server. Used for bare-metal
// and host-podman deployments. Requires podman or docker on PATH.
//
// - **remote** (remoteURL="unix:///var/run/podman/podman.sock" or
// similar): the engine binary is the local podman CLIENT, invoked
// as `podman --remote --url=<remoteURL> run …`; the actual
// container creation happens in whatever process owns the socket
// (typically a `podman system service` sidecar in the same pod).
// Used for the Kubernetes sidecar pattern so zddc-server's own
// pod stays unprivileged. Bind-mount paths must resolve identically
// on both sides — see scratchDir.
//
// The runner relies on `--pull=missing` so the operator never has to
// pre-pull images: the first request that needs an image pulls it,
// subsequent requests use the local cache. Both podman and docker
// honour this flag identically.
type containerRunner struct {
// Resource limits stored here are advisory only; the wrapper reads
// them via env (ZDDC_CONV_MEM_MAX, ZDDC_CONV_PIDS_MAX) and applies
// them to its transient cgroup. Wall-clock timeout IS enforced
// here via context.WithTimeout.
type localRunner struct {
mu sync.RWMutex
engine string
remoteURL string
memMiB int
cpus string
pids int
timeout time.Duration
}
func newLocalRunner() *localRunner {
return &localRunner{
memMiB: 1024, // 1 GiB — matches the wrapper's default
pids: 256,
timeout: 60 * time.Second,
}
}
// SetLimits updates the resource ceilings advertised to the wrapper
// script via env vars + the wall-clock timeout enforced here.
// Zero values keep the previous setting (or constructor defaults).
// Safe to call from multiple goroutines.
func (lr *localRunner) SetLimits(memMiB int, pids int, timeout time.Duration) {
lr.mu.Lock()
defer lr.mu.Unlock()
if memMiB > 0 {
lr.memMiB = memMiB
}
if pids > 0 {
lr.pids = pids
}
if timeout > 0 {
lr.timeout = timeout
}
}
func (lr *localRunner) Run(ctx context.Context, binary string, stdin []byte, scratchDir string, cmd []string) ([]byte, error) {
lr.mu.RLock()
memMiB := lr.memMiB
pids := lr.pids
timeout := lr.timeout
lr.mu.RUnlock()
if binary == "" {
return nil, ErrUnavailable
}
runCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
c := exec.CommandContext(runCtx, binary, cmd...)
c.Cancel = func() error {
if c.Process == nil {
return nil
}
return c.Process.Kill()
}
c.WaitDelay = 2 * time.Second
c.SysProcAttr = sysProcAttr()
// Minimal env passed to the wrapper. The wrapper does
// --clearenv inside the bwrap sandbox so the real binary
// sees only what bwrap re-injects (HOME, PATH, LANG). These
// vars are read by the WRAPPER itself, not the binary, to
// drive its cgroup setup + scratch-dir bind mount.
env := []string{
"PATH=" + os.Getenv("PATH"),
"HOME=" + os.TempDir(),
fmt.Sprintf("ZDDC_CONV_MEM_MAX=%dM", memMiB),
fmt.Sprintf("ZDDC_CONV_PIDS_MAX=%d", pids),
}
if scratchDir != "" {
env = append(env, "ZDDC_SCRATCH="+scratchDir)
}
c.Env = env
c.Stdin = bytes.NewReader(stdin)
var stdoutBuf bytes.Buffer
c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20}
stderr := newRingWriter(4 << 10)
c.Stderr = stderr
if err := c.Run(); err != nil {
exitCode := -1
if ee, ok := err.(*exec.ExitError); ok {
exitCode = ee.ExitCode()
}
if runCtx.Err() == context.DeadlineExceeded {
return nil, &ConvertError{
Tool: binary,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()),
}
}
return nil, &ConvertError{
Tool: binary,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: err,
}
}
return stdoutBuf.Bytes(), nil
}
var (
// shared default runner, populated by InstallRunner (called from
// the health probe at startup once the engine is known).
// the health probe at startup once the binaries are confirmed).
defaultRunnerMu sync.RWMutex
defaultRunner Runner
)
// InstallRunner sets the package-level Runner used by ToDocx/ToHTML/ToPDF.
// Tests inject a fake; production code lets the health probe install a
// containerRunner. Safe to call from multiple goroutines.
// InstallRunner sets the package-level Runner used by ToDocx/ToHTML/
// ToPDF. Tests inject a fake; production code lets the health probe
// install a localRunner. Safe to call from multiple goroutines.
func InstallRunner(r Runner) {
defaultRunnerMu.Lock()
defaultRunner = r
defaultRunnerMu.Unlock()
}
// ConfigureLimits applies resource limits to the package-level Runner,
// if it's a containerRunner. No-op when no runner is installed yet
// (the probe failed) or when the installed runner doesn't accept
// ConfigureLimits applies resource limits to the package-level
// Runner, if it's a localRunner. No-op when no runner is installed
// yet (the probe failed) or when the installed runner doesn't accept
// limits (e.g. a test fake). Zero values keep the previous setting.
//
// Called from cmd/zddc-server/main.go after Probe so the limits from
// the operator's flags take effect before any conversion request lands.
func ConfigureLimits(memMiB int, cpus string, pids int, timeout time.Duration) {
// Called from cmd/zddc-server/main.go after Probe so the limits
// from the operator's flags take effect before any conversion
// request lands.
func ConfigureLimits(memMiB int, pids int, timeout time.Duration) {
defaultRunnerMu.RLock()
r := defaultRunner
defaultRunnerMu.RUnlock()
if cr, ok := r.(*containerRunner); ok {
cr.SetLimits(memMiB, cpus, pids, timeout)
if lr, ok := r.(*localRunner); ok {
lr.SetLimits(memMiB, pids, timeout)
}
}
@ -156,428 +217,8 @@ func currentRunner() Runner {
return r
}
// SetLimits updates the resource ceilings used for subsequent Run
// invocations. Zero values keep the previous setting (or the defaults
// set at construction). Safe to call from multiple goroutines.
func (cr *containerRunner) SetLimits(memMiB int, cpus string, pids int, timeout time.Duration) {
cr.mu.Lock()
defer cr.mu.Unlock()
if memMiB > 0 {
cr.memMiB = memMiB
}
if cpus != "" {
cr.cpus = cpus
}
if pids > 0 {
cr.pids = pids
}
if timeout > 0 {
cr.timeout = timeout
}
}
func newContainerRunner(engine, remoteURL string) *containerRunner {
return &containerRunner{
engine: engine,
remoteURL: remoteURL,
memMiB: 512,
cpus: "2",
pids: 100,
timeout: 30 * time.Second,
}
}
// Run executes one container invocation. cmd is the argv passed to the
// image's entrypoint (pandoc for pandoc/latex, chromium-browser for
// alpine-chrome). mounts is a list of "<hostPath>:<containerPath>"
// strings; ":ro" is appended when no mode segment is present. stdin is
// piped to the container, stdout is returned as bytes (capped at
// 128 MiB).
//
// Hardening:
// - --pull=missing: image is fetched on first use, cached after.
// Operator only needs podman/docker installed; no manual pull.
// - --rm: container is removed on exit, even if killed.
// - --network=none: no network inside the container. Prevents data
// exfiltration through embedded URLs in source documents.
// - --read-only + tmpfs on /tmp and /run: image fs is immutable;
// pandoc/chromium scratch goes to tmpfs only.
// - --memory / --cpus / --pids-limit: kernel-enforced caps.
// - --cap-drop=ALL + --security-opt=no-new-privileges: standard
// container-escape hardening.
// - context-cancel kill + WaitDelay: a wedged podman gets force-
// killed; pipes drop after 2s so we don't leak goroutines.
// - cmd.Env minimal: only PATH + HOME are passed through to the
// engine binary; the container itself sees only what the image
// bakes in plus what --env adds (HOME=/tmp).
//
// Note: --user is intentionally NOT set so each image uses its
// default user (pandoc/latex runs as root, alpine-chrome runs as
// uid 1000). With --read-only + tmpfs + --cap-drop=ALL +
// --network=none + --no-new-privileges the additional defense from
// forcing nobody is small and would break alpine-chrome's own
// user-data-dir layout.
func (cr *containerRunner) Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
cr.mu.RLock()
engine := cr.engine
remoteURL := cr.remoteURL
memMiB := cr.memMiB
cpus := cr.cpus
pids := cr.pids
timeout := cr.timeout
cr.mu.RUnlock()
if engine == "" {
return nil, ErrUnavailable
}
image := tool.Image
if image == "" {
return nil, fmt.Errorf("convert.Run: tool.Image is empty (containerRunner requires an OCI image ref)")
}
runCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
// Client args. In remote mode, prepend --remote and --url so the
// podman CLI dispatches the request to the sidecar's
// `podman system service` instead of creating a container locally.
// The remaining flags (--rm, --pull=missing, etc.) apply to the
// container that the remote daemon will create — same wire format
// as local mode.
var args []string
if remoteURL != "" {
args = append(args, "--remote", "--url="+remoteURL)
}
args = append(args,
"run",
"--rm",
"--pull=missing",
"-i",
)
// --userns=host only in local mode: needed when zddc-server itself
// is the one running podman inside a Kubernetes pod, because the
// kernel won't let an inner rootless podman set up its own userns
// via newuidmap. In remote (sidecar) mode the sidecar runs as root
// and creates the inner container in its own (rootful) namespace,
// so --userns=host is unnecessary and potentially noisy.
if remoteURL == "" {
args = append(args, "--userns=host")
}
args = append(args,
"--network=none",
"--read-only",
// /tmp must be large enough to host chromium's shared-memory
// fallback (--disable-dev-shm-usage redirects /dev/shm writes
// here) plus the user-data-dir. 256 MiB is plenty for the
// HTML→PDF flow; pandoc itself uses almost none.
"--tmpfs=/tmp:size=256m,exec",
"--tmpfs=/run:size=4m",
fmt.Sprintf("--memory=%dm", memMiB),
fmt.Sprintf("--cpus=%s", cpus),
fmt.Sprintf("--pids-limit=%d", pids),
"--cap-drop=ALL",
"--security-opt=no-new-privileges",
"--env=HOME=/tmp",
"--workdir=/tmp",
)
for _, m := range mounts {
if !strings.Contains(m, ":ro") && !strings.Contains(m, ":rw") {
m += ":ro"
}
args = append(args, "--volume="+m)
}
args = append(args, image)
args = append(args, cmd...)
c := exec.CommandContext(runCtx, engine, args...)
c.Cancel = func() error {
if c.Process == nil {
return nil
}
return c.Process.Kill()
}
c.WaitDelay = 2 * time.Second
c.SysProcAttr = sysProcAttr()
c.Env = []string{
"PATH=" + os.Getenv("PATH"),
"HOME=" + os.TempDir(),
}
c.Stdin = bytes.NewReader(stdin)
var stdoutBuf bytes.Buffer
c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20}
stderr := newRingWriter(4 << 10)
c.Stderr = stderr
err := c.Run()
if err != nil {
exitCode := -1
if ee, ok := err.(*exec.ExitError); ok {
exitCode = ee.ExitCode()
}
toolName := imageTag(image)
if runCtx.Err() == context.DeadlineExceeded {
return nil, &ConvertError{
Tool: toolName,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()),
}
}
return nil, &ConvertError{
Tool: toolName,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: err,
}
}
return stdoutBuf.Bytes(), nil
}
// ───────────────────────────────────────────────────────────────────────────
// bwrapRunner — default conversion engine.
//
// Wraps `bubblewrap` to run pandoc / chromium binaries directly in a
// per-call Linux-namespace sandbox. No daemon, no OCI images, no
// privileged outer container. Image-build bundles pandoc + chromium
// into the zddc-server image so the binaries are available on PATH;
// each conversion gets a fresh set of namespaces, a read-only view
// of the host's /usr (so the binary + its libs are visible), a tmpfs
// /tmp, and nothing else.
//
// This matches the threat model of the legacy containerRunner —
// untrusted source-markdown drives the binary, we contain any
// resulting RCE inside the bwrap sandbox — without the operational
// tax of running a container engine per conversion (image pull,
// daemon, socket, ~300ms startup).
//
// Hardening (mirror of containerRunner's flags):
// - --unshare-all + --share-net=off via omission → no network
// - --unshare-user-try → user namespace when kernel allows it
// - --die-with-parent → cleanup on zddc-server exit
// - --ro-bind /usr /usr, /lib /lib, /lib64 /lib64, /etc /etc, /bin /bin
// (where present) → tools + libs visible read-only
// - --proc /proc, --dev /dev → minimal pseudo-filesystems
// - --tmpfs /tmp (256 MiB) → scratch space, matches container path
// - --chdir /tmp → workdir
// - --clearenv + minimal HOME/PATH/LANG → no host env leaks
// - --cap-drop ALL (bwrap default, explicit for clarity)
// ───────────────────────────────────────────────────────────────────────────
type bwrapRunner struct {
mu sync.RWMutex
bin string // path to bwrap binary
memMiB int // currently advisory; bwrap has no built-in cap
cpus string // currently advisory
pids int // currently advisory
timeout time.Duration // context deadline per Run
}
func newBwrapRunner(bin string) *bwrapRunner {
return &bwrapRunner{
bin: bin,
memMiB: 512,
cpus: "2",
pids: 100,
timeout: 30 * time.Second,
}
}
// SetLimits — same shape as containerRunner.SetLimits. bwrap itself
// doesn't enforce cgroup limits; we capture the values so an operator
// can read them back via /.profile/config or the convert-health probe.
// Wrapping with systemd-run --scope --property MemoryMax=… is the
// follow-up if hard caps are needed; not in this iteration.
func (br *bwrapRunner) SetLimits(memMiB int, cpus string, pids int, timeout time.Duration) {
br.mu.Lock()
defer br.mu.Unlock()
if memMiB > 0 {
br.memMiB = memMiB
}
if cpus != "" {
br.cpus = cpus
}
if pids > 0 {
br.pids = pids
}
if timeout > 0 {
br.timeout = timeout
}
}
func (br *bwrapRunner) Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
br.mu.RLock()
bwrapBin := br.bin
timeout := br.timeout
br.mu.RUnlock()
if bwrapBin == "" {
return nil, ErrUnavailable
}
if tool.Binary == "" {
return nil, fmt.Errorf("convert.Run: tool.Binary is empty (bwrapRunner requires a host-binary name)")
}
runCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
args, err := buildBwrapArgs(tool.Binary, mounts, cmd)
if err != nil {
return nil, err
}
c := exec.CommandContext(runCtx, bwrapBin, args...)
c.Cancel = func() error {
if c.Process == nil {
return nil
}
return c.Process.Kill()
}
c.WaitDelay = 2 * time.Second
c.SysProcAttr = sysProcAttr()
c.Env = []string{
"PATH=" + os.Getenv("PATH"),
"HOME=" + os.TempDir(),
}
c.Stdin = bytes.NewReader(stdin)
var stdoutBuf bytes.Buffer
c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20}
stderr := newRingWriter(4 << 10)
c.Stderr = stderr
if runErr := c.Run(); runErr != nil {
exitCode := -1
if ee, ok := runErr.(*exec.ExitError); ok {
exitCode = ee.ExitCode()
}
toolName := tool.Binary
if runCtx.Err() == context.DeadlineExceeded {
return nil, &ConvertError{
Tool: toolName,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()),
}
}
return nil, &ConvertError{
Tool: toolName,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: runErr,
}
}
return stdoutBuf.Bytes(), nil
}
// buildBwrapArgs assembles the bwrap argv for a single conversion.
// Exposed as a package-internal helper so tests can lock the sandbox
// flag shape without exec'ing bwrap. Returns an error when a mount
// spec is malformed.
func buildBwrapArgs(binary string, mounts, cmd []string) ([]string, error) {
args := []string{
// Namespace isolation. --unshare-all unshares user (when
// available), ipc, pid, net, uts, cgroup; --unshare-user-try
// downgrades cleanly when the kernel refuses (e.g. some
// container hosts disable user-namespace creation).
"--unshare-all",
"--unshare-user-try",
"--die-with-parent",
// Read-only system view. Each --ro-bind only mounts paths
// that exist on the host; for hosts where /lib is a symlink
// into /usr/lib (modern Linux) the symlink resolution lets
// bwrap mount /usr's contents through.
"--ro-bind", "/usr", "/usr",
"--ro-bind-try", "/lib", "/lib",
"--ro-bind-try", "/lib64", "/lib64",
"--ro-bind-try", "/bin", "/bin",
"--ro-bind-try", "/sbin", "/sbin",
"--ro-bind-try", "/etc", "/etc",
// Pseudo-filesystems. /proc and /dev are required for any
// non-trivial binary; we make them minimal.
"--proc", "/proc",
"--dev", "/dev",
// Scratch. 256 MiB tmpfs at /tmp matches containerRunner.
// chromium spills its shared-memory fallback (--disable-dev-
// shm-usage) here, so the budget actually matters.
"--tmpfs", "/tmp",
"--size", "268435456", // 256 MiB; applies to the most recent --tmpfs
"--chdir", "/tmp",
// Minimal env. HOME=/tmp lets chromium write its
// user-data-dir without permission errors; PATH covers the
// usual install locations for pandoc + chromium across
// alpine / debian / rhel.
"--clearenv",
"--setenv", "HOME", "/tmp",
"--setenv", "PATH", "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"--setenv", "LANG", "C.UTF-8",
}
// Caller-supplied bind mounts (template, output, …). Same
// "host:target[:ro|:rw]" syntax as containerRunner; we translate
// to bwrap's --ro-bind / --bind.
for _, m := range mounts {
host, target, mode, ok := splitMount(m)
if !ok {
return nil, fmt.Errorf("convert.Run: invalid mount spec %q (want host:target[:ro|:rw])", m)
}
if mode == "rw" {
args = append(args, "--bind", host, target)
} else {
args = append(args, "--ro-bind", host, target)
}
}
// Finally the binary + its argv. The binary path is PATH-resolved
// inside the sandbox via the constructed PATH above; if the
// operator passed an absolute path it bypasses PATH lookup and is
// invoked verbatim (still subject to the /usr bind mount).
args = append(args, binary)
args = append(args, cmd...)
return args, nil
}
// splitMount parses "host:target[:ro|:rw]" into its three parts.
// The mode segment is optional; absent means read-only (matches the
// containerRunner default).
func splitMount(m string) (host, target, mode string, ok bool) {
parts := strings.SplitN(m, ":", 3)
if len(parts) < 2 {
return "", "", "", false
}
host = parts[0]
target = parts[1]
mode = "ro"
if len(parts) == 3 {
switch parts[2] {
case "ro", "rw":
mode = parts[2]
default:
return "", "", "", false
}
}
return host, target, mode, true
}
// imageTag extracts a short name for an image reference, used as the
// "Tool" label on ConvertError. "docker.io/pandoc/latex:latest" →
// "pandoc/latex".
func imageTag(image string) string {
s := image
// Strip registry prefix.
if i := strings.Index(s, "/"); i >= 0 {
if strings.Contains(s[:i], ".") || strings.Contains(s[:i], ":") {
s = s[i+1:]
}
}
// Strip tag suffix.
if i := strings.LastIndex(s, ":"); i >= 0 {
s = s[:i]
}
return s
}
// limitWriter caps the underlying buffer at max bytes. Writes past the
// cap return io.ErrShortWrite, which surfaces as a Run() error — the
// limitWriter caps the underlying buffer at max bytes. Writes past
// the cap return an error which surfaces as a Run() error — the
// caller then maps to 422 (output too large) at the handler edge.
type limitWriter struct {
w io.Writer
@ -600,9 +241,9 @@ func (l *limitWriter) Write(p []byte) (int, error) {
return n, err
}
// ringWriter keeps only the tail of what's written — useful for stderr
// capture where the most-recent bytes are the ones with the actual
// error message and earlier output is usually progress noise.
// ringWriter keeps only the tail of what's written — useful for
// stderr capture where the most-recent bytes carry the actual error
// message and earlier output is usually progress noise.
type ringWriter struct {
mu sync.Mutex
buf []byte
@ -636,16 +277,14 @@ func (r *ringWriter) String() string {
// writeAssetsToScratch materialises the embedded viewer-template.html
// and custom.css into a fresh scratch dir and returns the host path.
// Caller is responsible for os.RemoveAll(dir) when done. Used by
// ToHTML which needs the template visible inside the container.
// ToHTML which needs the template visible inside the sandbox.
//
// scratchRoot controls where the temp dir lands. Empty means "use
// $TMPDIR" (local mode default). In remote/sidecar mode the caller
// passes the shared mount path (e.g. "/work") so the podman-service
// sidecar sees the bind-mount source at the same path.
// scratchRoot controls where the temp dir lands. Empty means
// "use $TMPDIR".
//
// Files are written world-readable so the container's default user
// (root for pandoc/latex, uid 1000 for alpine-chrome) can read them
// through the read-only bind mount regardless of the host's umask.
// Files are written world-readable so the binary's default user can
// read them through the wrapper's bind mount regardless of the
// host's umask.
func writeAssetsToScratch(scratchRoot string) (string, error) {
dir, err := os.MkdirTemp(scratchRoot, "zddc-convert-")
if err != nil {

View file

@ -97,7 +97,7 @@ func ServeConverted(cfg config.Config, w http.ResponseWriter, r *http.Request, s
if !ok {
// One re-probe attempt — gives the operator a way to recover
// after building the image without restarting the server.
caps = convert.Reprobe(r.Context(), os.Getenv("ZDDC_CONVERT_ENGINE"))
caps = convert.Reprobe(r.Context())
if !caps.Ready() {
w.Header().Set("Retry-After", "60")
http.Error(w, "Service Unavailable — "+caps.Reason(), http.StatusServiceUnavailable)

View file

@ -1511,7 +1511,7 @@ body.is-elevated::after {
</svg>
<div class="header-title-group">
<span class="app-header__title" id="table-title">ZDDC Table</span>
<span class="build-timestamp"><span style="color:red;font-weight:bold">v0.0.17-alpha · 2026-05-19 11:59:55 · 73e34be-dirty</span></span>
<span class="build-timestamp"><span style="color:red;font-weight:bold">v0.0.17-alpha · 2026-05-19 12:37:53 · 847e082-dirty</span></span>
</div>
</div>
<div class="header-right">

View file

@ -1,18 +1,30 @@
# Runtime image for zddc-server.
#
# Bundles the conversion toolchain (pandoc + chromium + bubblewrap) so
# the MD→DOCX/HTML/PDF endpoint works without an external container
# engine. The convert package's bwrap engine (production default)
# sandboxes each pandoc/chromium invocation in a fresh Linux-namespace;
# no daemon, no socket, no privileged outer container, no OCI image
# pull at conversion time.
# Bundles the conversion toolchain (pandoc + chromium + bubblewrap)
# AND two wrapper scripts that shadow the real binaries on PATH.
# When zddc-server exec's "pandoc" or "chromium-browser", it hits
# /usr/local/bin/pandoc (a symlink to runtime/zddc-sandbox-exec),
# which:
#
# 1. creates a transient cgroup v2 with memory + pids caps,
# 2. drops the process into that cgroup,
# 3. wraps the real binary in a bubblewrap sandbox (private
# namespaces, read-only /usr, fresh tmpfs at /tmp, no network),
# 4. exec's /usr/bin/<name>.
#
# zddc-server's Go code is unaware of any of this — its only contract
# is "if I exec pandoc with these args, I get pandoc behavior." The
# isolation strategy lives entirely in the image; an operator who
# wants firejail / systemd-nspawn / podman-run instead just replaces
# the wrapper script and the binary code keeps working.
#
# Used by helm charts (helm/zddc-server-prod/) as the main-container
# image. The build is independent of zddc-server itself — the binary
# is built by the helm chart's init container from a pinned git ref
# and copied into this runtime image's filesystem at start. Image
# tags should track the upstream package versions (pandoc, chromium)
# more than zddc-server, since the binary is layered in at deploy time.
# image. The binary is built by the chart's init container from a
# pinned git ref and copied into a shared emptyDir; the chart's
# command is /usr/local/libexec/zddc-cgroup-init /zddc/zddc-server,
# so the cgroup v2 hierarchy is delegated before zddc-server starts
# (see runtime/zddc-cgroup-init for the "no internal processes"
# constraint that requires this indirection).
#
# Build:
# podman build -t zddc-server-runtime:latest \
@ -23,8 +35,7 @@
# codeberg.org/varasys/zddc-server-runtime:vYYYYMMDD
# podman push codeberg.org/varasys/zddc-server-runtime:vYYYYMMDD
#
# Size: ≈ 1 GB unpacked (chromium dominates). Container engines
# layer + dedupe the chromium libs across replicas on the same node.
# Size: ≈ 1 GB unpacked (chromium dominates).
FROM docker.io/library/alpine:3
RUN apk add --no-cache \
@ -34,8 +45,12 @@ RUN apk add --no-cache \
font-noto \
ca-certificates
# The init container in helm/zddc-server-*/templates/deployment.yaml
# writes the compiled zddc-server binary to /zddc/zddc-server in a
# shared emptyDir volume; the main container's command is
# `/zddc/zddc-server`. No CMD/ENTRYPOINT here because the binary
# path is provided by the chart, not baked into the image.
# Wrapper scripts. zddc-cgroup-init runs at container start to
# prepare cgroup v2 subtree_control delegation; zddc-sandbox-exec
# is invoked per-conversion via the symlinks below.
COPY runtime/zddc-cgroup-init /usr/local/libexec/zddc-cgroup-init
COPY runtime/zddc-sandbox-exec /usr/local/libexec/zddc-sandbox-exec
RUN chmod 0755 /usr/local/libexec/zddc-cgroup-init \
/usr/local/libexec/zddc-sandbox-exec \
&& ln -s /usr/local/libexec/zddc-sandbox-exec /usr/local/bin/pandoc \
&& ln -s /usr/local/libexec/zddc-sandbox-exec /usr/local/bin/chromium-browser

82
zddc/runtime/zddc-cgroup-init Executable file
View file

@ -0,0 +1,82 @@
#!/bin/sh
# zddc-cgroup-init — prepare cgroup v2 hierarchy and exec zddc-server.
#
# The per-conversion wrapper (zddc-sandbox-exec) creates a transient
# child cgroup for each pandoc / chromium invocation, sets memory.max
# and pids.max on it, and moves the conversion process in. That only
# works when:
#
# (a) the cgroup v2 hierarchy is mounted at /sys/fs/cgroup, AND
# (b) the controllers we need (memory, pids) are enabled in the
# parent cgroup's subtree_control file, AND
# (c) the parent cgroup has NO processes in it (cgroup v2's
# "no internal processes" constraint: a cgroup can have
# children OR processes, not both).
#
# A bare container with PID 1 in the root cgroup violates (c). This
# init script does the one-time setup BEFORE exec'ing zddc-server:
#
# 1. mkdir /sys/fs/cgroup/zddc/ (a sibling for zddc-server)
# 2. move every PID out of root into /sys/fs/cgroup/zddc/
# 3. enable +memory +pids in root's subtree_control (now empty)
# 4. enable +memory +pids in zddc/'s subtree_control (so its
# children — the per-conversion cgroups created by the wrapper
# — can use those controllers)
# 5. exec zddc-server (which inherits cgroup membership in zddc/)
#
# After this, the wrapper script creates /sys/fs/cgroup/conv.<pid>/
# as a sibling of /sys/fs/cgroup/zddc/, sets limits, and moves the
# pandoc/chromium process in. Each conversion gets a fresh transient
# cgroup that vanishes when the process exits.
#
# Best-effort: if any step fails (cgroup v1, undelegated subtree,
# read-only cgroupfs in some other container shape), this script
# still exec's zddc-server. The convert pipeline degrades to
# "bwrap sandbox + wall-clock timeout"; an operator notices via
# the warning log line below.
set -eu
setup_cgroup_v2() {
cgroot=/sys/fs/cgroup
[ -d "$cgroot" ] || return 1
# Detect cgroup v2 by the presence of cgroup.controllers at root.
[ -r "$cgroot/cgroup.controllers" ] || return 1
# Need memory + pids in available controllers.
if ! grep -qw memory "$cgroot/cgroup.controllers"; then
echo "zddc-cgroup-init: cgroup.controllers lacks 'memory' — per-conversion memory cap will be unenforced" >&2
fi
# Create the leaf where zddc-server itself will live.
mkdir -p "$cgroot/zddc" || return 1
# Move every PID currently in the root cgroup into zddc/. The
# root must be empty before we can enable subtree_control.
if [ -r "$cgroot/cgroup.procs" ]; then
while read -r pid; do
[ -n "$pid" ] || continue
# Best-effort; processes can exit between read and write.
printf "%s\n" "$pid" > "$cgroot/zddc/cgroup.procs" 2>/dev/null || true
done < "$cgroot/cgroup.procs"
fi
# Enable controllers at root → makes them usable in immediate
# children (zddc/ and any sibling per-conversion cgroup).
printf "+memory +pids" > "$cgroot/cgroup.subtree_control" 2>/dev/null || {
echo "zddc-cgroup-init: could not enable +memory +pids in $cgroot/cgroup.subtree_control — caps will not apply" >&2
return 1
}
# Enable inside zddc/ too, so any deeper children of zddc-server
# (which there shouldn't be, but defense in depth) inherit.
printf "+memory +pids" > "$cgroot/zddc/cgroup.subtree_control" 2>/dev/null || true
return 0
}
if ! setup_cgroup_v2; then
echo "zddc-cgroup-init: cgroup v2 setup unavailable — running without per-conversion caps" >&2
fi
# Hand off to zddc-server. The exec'd process lands in
# /sys/fs/cgroup/zddc/ (we moved ourselves there above). When it
# spawns the wrapper, the wrapper creates a transient sibling cgroup
# under /sys/fs/cgroup/, NOT a child of zddc/, so the conversion's
# cgroup is a peer of zddc-server's — keeping zddc-server's own
# resource accounting separate from conversion accounting.
exec "$@"

118
zddc/runtime/zddc-sandbox-exec Executable file
View file

@ -0,0 +1,118 @@
#!/bin/sh
# zddc-sandbox-exec — drop-in wrapper for pandoc and chromium-browser.
#
# Invoked via symlinks at /usr/local/bin/pandoc and
# /usr/local/bin/chromium-browser. zddc-server (and any other caller
# that uses the default PATH) exec's by short name, hits this script
# first, and we transparently run the real binary at /usr/bin/<name>
# inside:
#
# 1. a transient cgroup v2 (memory + pids cap, kernel-enforced)
# 2. a bubblewrap sandbox (private namespaces, ro-bind /usr, fresh
# tmpfs at /tmp, no network)
#
# zddc-server's Go code does not know about either layer — its only
# contract with the image is "if I exec pandoc with these args, I
# get pandoc behavior back." Swap the wrapper for a different
# isolation strategy (firejail, nspawn, podman-run, raw exec) and
# nothing changes in Go.
#
# Caller-tunable env (with defaults):
#
# ZDDC_SCRATCH host directory to bind-mount read-write
# inside the sandbox at the SAME path. Set by
# zddc-server per-conversion; the markdown
# template, intermediate HTML, and chromium
# output PDF all live there. Absent = no extra
# bind mount; /tmp is a fresh tmpfs only.
# ZDDC_CONV_MEM_MAX cgroup memory.max value (default "1G").
# cgroup v2 syntax — bytes, "1G", or "max".
# ZDDC_CONV_PIDS_MAX cgroup pids.max value (default "256").
# ZDDC_CONV_TMPFS_SIZE bwrap tmpfs /tmp byte size (default 256 MiB).
set -eu
NAME=$(basename "$0")
REAL="/usr/bin/$NAME"
if [ ! -x "$REAL" ]; then
echo "zddc-sandbox-exec: $NAME — real binary not found at $REAL" >&2
exit 127
fi
# ── 1. cgroup v2 (best-effort) ──────────────────────────────────────────
#
# zddc-cgroup-init enables +memory +pids in /sys/fs/cgroup/cgroup.
# subtree_control at container start (see that script for the cgroup
# v2 "no internal processes" wrinkle that requires the indirection).
# Here we just need to mkdir a transient child, set caps, move
# ourselves in. The real binary inherits cgroup membership at exec.
CG_ROOT="/sys/fs/cgroup"
CG_CONTROL="$CG_ROOT/cgroup.subtree_control"
if [ -w "$CG_CONTROL" ] && grep -qw memory "$CG_CONTROL" 2>/dev/null; then
CG="$CG_ROOT/conv.$$"
if mkdir "$CG" 2>/dev/null; then
# rmdir on exit so the cgroupfs doesn't leak. Best-effort:
# the kernel reaps empty cgroups when the last PID leaves
# anyway, but we tidy up for the case where the wrapper
# itself exits before exec'ing the real binary.
trap 'rmdir "$CG" 2>/dev/null || true' EXIT INT TERM
printf "%s\n" "${ZDDC_CONV_MEM_MAX:-1G}" > "$CG/memory.max" 2>/dev/null || true
printf "%s\n" "${ZDDC_CONV_PIDS_MAX:-256}" > "$CG/pids.max" 2>/dev/null || true
printf "%s\n" "$$" > "$CG/cgroup.procs" 2>/dev/null || true
fi
fi
# ── 2. bwrap sandbox ────────────────────────────────────────────────────
#
# Mirror the hardening that internal/convert previously assembled in
# Go: unshare every namespace (--unshare-all also covers network),
# bind /usr read-only so the binary + its libs are visible, drop a
# fresh tmpfs at /tmp, clear the environment to a minimal floor.
#
# Building the bwrap argv preserves "$@" (the original pandoc /
# chromium args) by PREPENDING bwrap flags onto the existing
# positional parameters. Each `set -- new-flag "$@"` puts one flag
# at the front; reads back-to-front the final argv is:
#
# bwrap --unshare-all --unshare-user-try ... -- REAL_BINARY ORIG_ARGS
#
# This is the standard POSIX-sh idiom for "build a command line
# without an array type."
set -- "$REAL" "$@" # REAL ORIG
set -- -- "$@" # -- REAL ORIG
# Optional scratch dir, prepended just before "-- REAL ORIG" so it
# lands inside the bwrap flag list:
if [ -n "${ZDDC_SCRATCH:-}" ] && [ -d "$ZDDC_SCRATCH" ]; then
set -- --bind "$ZDDC_SCRATCH" "$ZDDC_SCRATCH" "$@"
fi
# Common bwrap flags (each one prepended; final order is bottom-up).
set -- --setenv LANG C.UTF-8 "$@"
set -- --setenv PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin "$@"
set -- --setenv HOME /tmp "$@"
set -- --clearenv "$@"
set -- --chdir /tmp "$@"
# bwrap's --size sets the size of the NEXT --tmpfs, so in argv order
# --size must come before --tmpfs. Building bottom-up via prepend means
# the LATER statement here lands earlier in argv: write --tmpfs first
# then --size, so the final $@ starts with "... --size N --tmpfs /tmp".
set -- --tmpfs /tmp "$@"
set -- --size "${ZDDC_CONV_TMPFS_SIZE:-268435456}" "$@"
set -- --dev /dev "$@"
set -- --proc /proc "$@"
set -- --ro-bind-try /etc /etc "$@"
set -- --ro-bind-try /sbin /sbin "$@"
set -- --ro-bind-try /bin /bin "$@"
set -- --ro-bind-try /lib64 /lib64 "$@"
set -- --ro-bind-try /lib /lib "$@"
set -- --ro-bind /usr /usr "$@"
set -- --die-with-parent "$@"
set -- --unshare-user-try "$@"
set -- --unshare-all "$@"
exec bwrap "$@"