From cef7188a7763c225520e9123ba9c7b6e84adc9dc Mon Sep 17 00:00:00 2001 From: ZDDC Date: Tue, 19 May 2026 07:47:58 -0500 Subject: [PATCH] refactor(convert): wrapper-in-image owns the sandbox; Go just exec's binaries MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The bwrap engine + OCI engine that lived in internal/convert/runner.go both leak isolation policy into Go code. Replaced with a single image- side wrapper that drop-in-shadows pandoc and chromium-browser on PATH. zddc-server's only contract with the image is now "exec.Command(name, args) gets you that tool's behavior" — sandboxing, resource caps, and namespace setup live entirely in shell scripts shipped by the image. Architecture: - zddc/runtime/zddc-cgroup-init runs at container start. cgroup v2's "no internal processes" constraint forbids a cgroup from having both children and processes; the init script moves PID 1 into a child, enables +memory +pids in subtree_control, then exec's zddc-server. Best-effort: degrades cleanly to "no resource caps" if cgroupfs isn't writable. - zddc/runtime/zddc-sandbox-exec is the per-call wrapper, symlinked from /usr/local/bin/{pandoc,chromium-browser}. Creates a transient cgroup v2 (memory.max + pids.max), then bubblewrap-sandboxes the real binary at /usr/bin/: --unshare-all, --ro-bind /usr, --proc /proc, --tmpfs /tmp, --clearenv. Caller's scratch dir comes in via ZDDC_SCRATCH env and is bind-mounted at the SAME path so absolute paths round-trip unchanged. Go simplifications (~250 lines net deletion): - Runner interface: Run(ctx, binary, stdin, scratchDir, cmd) — no ToolSpec, no mount list, no engine concept. Single localRunner implementation; bwrapRunner + containerRunner both deleted. - health.Probe just looks up pandoc + chromium on PATH; Capabilities drops engine kinds. - Convert.go: ToHTML/ToPDF write to a per-call scratch dir under TMPDIR and pass absolute paths; the wrapper bind-mounts the dir. No more "/tpl" / "/pdf" mount-point indirection. - Config drops --convert-pandoc-image, --convert-chromium-image, --convert-engine, --convert-podman-socket (OCI engine gone) and --convert-cpus (CPU caps don't apply in the new model — wall-clock + memory + pids is the cap set). Defaults raised to match the new caps the user authorized: mem 512→1024 MiB, pids 100→256, timeout 30→60 s. Image: - zddc/runtime.Containerfile builds the production runtime image (alpine + bubblewrap + pandoc + chromium + font-noto). Two COPY statements pull in the wrapper scripts; ln -s symlinks the shadow names. - bitnest dev image mirrors this layout under /var/lib/zddc-dev-build/. Container privilege required: - Nested bwrap needs the outer container to permit user + mount namespace creation + MS_SLAVE on root. The default seccomp + AppArmor profiles block all of these. Quadlet adds: --cap-add=ALL --security-opt=seccomp=unconfined --security-opt=apparmor=unconfined --security-opt=unmask=ALL Helm chart sets the equivalent via securityContext (capabilities. add: SYS_ADMIN, seccompProfile.type: Unconfined, appArmorProfile. type: Unconfined). Trade-off documented in AGENTS.md: zddc-server RCE now has near-root power within the container, but the bind- mount layout still bounds blast radius; bwrap is the real boundary between zddc-server and untrusted markdown. Tests: convert_test.go fully rewritten for the new Runner signature. Drops TestBwrapArgs_* (functionality moved out of Go) and TestImageTag (no more image refs). All 15 Go test packages green. Verified live on bitnest: pandoc --version round-trip exits 0 through the wrapper; MD→DOCX produces a valid Word 2007+ file end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) --- AGENTS.md | 25 +- ARCHITECTURE.md | 2 +- .../templates/deployment.yaml | 31 +- zddc/cmd/zddc-server/main.go | 31 +- zddc/internal/config/config.go | 73 +- zddc/internal/convert/convert.go | 193 ++--- zddc/internal/convert/convert_test.go | 268 ++----- zddc/internal/convert/health.go | 256 ++----- zddc/internal/convert/runner.go | 675 ++++-------------- zddc/internal/handler/converthandler.go | 2 +- zddc/internal/handler/tables.html | 2 +- zddc/runtime.Containerfile | 51 +- zddc/runtime/zddc-cgroup-init | 82 +++ zddc/runtime/zddc-sandbox-exec | 118 +++ 14 files changed, 691 insertions(+), 1118 deletions(-) create mode 100755 zddc/runtime/zddc-cgroup-init create mode 100755 zddc/runtime/zddc-sandbox-exec diff --git a/AGENTS.md b/AGENTS.md index 2a3b55f..8bec6e9 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -345,19 +345,30 @@ The markdown editor lives at `browse/js/preview-markdown.js` and is mounted as t ## Server-side document conversion (`zddc/internal/convert`) -zddc-server can convert `.md` → DOCX/HTML/PDF on demand at `GET //foo.md?convert=docx|html|pdf`. Implementation: +zddc-server can convert `.md` → DOCX/HTML/PDF on demand at `GET //foo.md?convert=docx|html|pdf`. -- **Two engines, probed bwrap → podman → docker.** The first one found on PATH wins; `--convert-engine=` / `ZDDC_CONVERT_ENGINE` forces a choice. +**Architecture.** zddc-server's Go code does the bare minimum: it `exec.Command("pandoc", args...)` or `exec.Command("chromium-browser", args...)`. **The sandbox + resource caps live in the IMAGE**, not in Go. In the production runtime image (`zddc/runtime.Containerfile`), `/usr/local/bin/pandoc` and `/usr/local/bin/chromium-browser` are symlinks to `zddc-sandbox-exec` — a shell wrapper that: - - **bwrap (production default).** Wraps `bubblewrap` to run `pandoc` and `chromium-browser` directly in a per-call Linux-namespace sandbox: `--unshare-all --unshare-user-try --die-with-parent --ro-bind /usr /usr ... --proc /proc --dev /dev --tmpfs /tmp --clearenv`. No daemon, no socket, no OCI image pull at conversion time. Binaries are baked into the zddc-server runtime image (`zddc/runtime.Containerfile`) so the operator just runs the image. Configure binary names via `--convert-pandoc-binary` (default `pandoc`) / `--convert-chromium-binary` (default `chromium-browser`; debian/ubuntu installs as `chromium`). +1. Creates a transient cgroup v2 (memory + pids cap from `ZDDC_CONV_MEM_MAX` / `ZDDC_CONV_PIDS_MAX` env), moves itself in. +2. Wraps the real binary at `/usr/bin/` in a bubblewrap sandbox (`--unshare-all --unshare-user-try --die-with-parent --ro-bind /usr /usr ... --proc /proc --dev /dev --tmpfs /tmp --clearenv`). +3. exec's `/usr/bin/` with the original argv. - - **podman / docker (legacy fallback).** Wraps `podman run` / `docker run` with `--rm --pull=missing --network=none --read-only --tmpfs=/tmp:size=256m,exec --memory --cpus --pids-limit --cap-drop=ALL --security-opt=no-new-privileges --env=HOME=/tmp`. Used when the operator wants OCI-image isolation per conversion and already has an engine on PATH. Default images `docker.io/pandoc/latex:latest` (override via `--convert-pandoc-image=` / `ZDDC_CONVERT_PANDOC_IMAGE`) and `docker.io/zenika/alpine-chrome:latest` (override via `--convert-chromium-image=`). +Why this shape: swapping isolation strategies (firejail, systemd-nspawn, podman-run, raw exec for dev) is purely an image concern. The Go code never changed. A separate `zddc-cgroup-init` script runs at container start to delegate cgroup v2 `subtree_control` (the "no internal processes" constraint), then exec's zddc-server. Both scripts live in `zddc/runtime/`. -- Resource caps via `--convert-mem-mib` (default 512), `--convert-cpus` (default "2"), `--convert-pids` (default 100), `--convert-timeout` (default 30s). bwrap stores them advisorily (no cgroup enforcement in this iteration); the OCI engine maps them to `--memory` / `--cpus` / `--pids-limit`. -- I/O via bind mount + stdin/stdout. Pandoc reads markdown from stdin, writes to stdout. The viewer template is bind-mounted read-only at `/tpl`. Chromium reads HTML from a read-write bind mount at `/pdf` and writes the PDF to the same mount; the host reads it back. Mount-spec syntax (`host:target[:ro|:rw]`) is identical across engines; the runner translates to `--ro-bind` / `--bind` (bwrap) or `--volume` (podman/docker). +**Outer-container privileges.** Nested bwrap needs the outer container to permit user + mount namespace creation. Pod Security Standards defaults block this. The helm chart sets `securityContext: capabilities.add: [SYS_ADMIN]`, `seccompProfile.type: Unconfined`, `appArmorProfile.type: Unconfined`. Trade-off: a zddc-server RCE has near-root power within the container's namespace, but the bind-mount layout (overlay fs, no host /home or /usr visible) still bounds the blast radius. The per-conversion bwrap sandbox is the real isolation boundary between zddc-server and untrusted pandoc/chromium. + +**Config knobs** (all in `cmd/zddc-server`): +- `--convert-pandoc-binary` (default `pandoc`) / `--convert-chromium-binary` (default `chromium-browser`; `chromium` on debian) +- `--convert-scratch-dir` (default `$TMPDIR`) — host scratch root; the wrapper bind-mounts the per-call subdir +- `--convert-mem-mib` (default 1024) → wrapper's `memory.max` +- `--convert-pids` (default 256) → wrapper's `pids.max` +- `--convert-timeout` (default 60s) → enforced in Go via `context.WithTimeout` + +**Other plumbing.** +- I/O via stdin/stdout + scratch dir. Pandoc reads markdown from stdin, writes to stdout. Templates + intermediate HTML + output PDF live in a per-call subdir under the scratch root; the dir's host path is passed to the child via `ZDDC_SCRATCH` so the wrapper bind-mounts it into the sandbox at the same path (no path translation). - Output cached at `/.converted/.` (hidden by the `.` prefix). mtime synced to source so the fast path is a stat-and-serve with no exec. PUT/DELETE/MOVE on the source `.md` purges the sidecars. - Per-project template variables (client/project/contractor/project_number) come from `.zddc` `convert:` cascade keys. Title/tracking_number/revision/status are derived from the filename via `zddc.ParseFilename`. -- If no sandbox engine is found on PATH, the endpoint serves 503 with a Retry-After. The rest of the server keeps working. +- If pandoc/chromium aren't on PATH (operator running zddc-server outside the runtime image), the endpoint serves 503 with a Retry-After. The rest of the server keeps working. Operators who run zddc-server with raw pandoc/chromium (no wrapper) get a working but unsandboxed conversion endpoint — useful for dev iteration. ## Form-data system (`form/` + zddc-server form handler) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 24c495b..3a40473 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -403,7 +403,7 @@ Files at the root level are ignored. The grouping folder list and transmittal fo **Dependencies:** Toast UI Editor v3.2.2 (vendored at `shared/vendor/toastui-editor-all.min.js`, concatenated into `browse/dist/browse.html` at build time). No runtime CDN, no Tailwind. -**Server-mode features:** When the file handle is an `HttpFileHandle` (so `node.url` is set and `state.source === 'server'`), three Download buttons appear in the file header — DOCX/HTML/PDF — fetching `?convert=` via `window.zddc.source.downloadConverted()`. Clicks auto-save first if the buffer is dirty so converted bytes reflect what's on screen. The server-side engine is in `zddc/internal/convert` — bwrap is the default sandbox (per-call Linux namespaces, no daemon, pandoc/chromium binaries baked into the runtime image), with podman/docker as legacy OCI-image fallbacks for hosts that already have a container engine. +**Server-mode features:** When the file handle is an `HttpFileHandle` (so `node.url` is set and `state.source === 'server'`), three Download buttons appear in the file header — DOCX/HTML/PDF — fetching `?convert=` via `window.zddc.source.downloadConverted()`. Clicks auto-save first if the buffer is dirty so converted bytes reflect what's on screen. The server-side engine is in `zddc/internal/convert`: zddc-server `exec.Command`s `pandoc` and `chromium-browser` directly, and the runtime image's wrapper at `/usr/local/bin/` (see `zddc/runtime.Containerfile` + `zddc/runtime/zddc-sandbox-exec`) handles the per-call cgroup v2 + bubblewrap sandbox between that exec and the real binary at `/usr/bin/`. Isolation strategy lives entirely in the image; swap the wrapper for firejail / nspawn / podman-run and Go doesn't change. --- diff --git a/helm/zddc-server-prod/templates/deployment.yaml b/helm/zddc-server-prod/templates/deployment.yaml index 10221c0..be54e27 100644 --- a/helm/zddc-server-prod/templates/deployment.yaml +++ b/helm/zddc-server-prod/templates/deployment.yaml @@ -64,7 +64,36 @@ spec: - name: zddc-server image: {{ printf "%s:%s" .Values.runtimeImage.repository .Values.runtimeImage.tag | quote }} imagePullPolicy: IfNotPresent - command: ["/zddc/zddc-server"] + # zddc-cgroup-init prepares cgroup v2 subtree_control then + # exec's zddc-server. Required because cgroup v2 forbids + # processes in a cgroup that has child cgroups; the per- + # conversion wrapper (zddc-sandbox-exec) creates child + # cgroups for resource caps, so the init script has to + # move zddc-server itself out of the root cgroup first. + # See zddc/runtime/zddc-cgroup-init in the source repo. + command: ["/usr/local/libexec/zddc-cgroup-init", "/zddc/zddc-server"] + # The conversion sandbox (bwrap, invoked per-call by + # /usr/local/bin/{pandoc,chromium-browser}) needs to create + # user + mount namespaces inside the container. Pod Security + # Standards default policies forbid this; the chart sets the + # minimum securityContext that lets bwrap function. If your + # cluster's admission controller rejects these settings, you + # have two choices: ask the platform team to allow this pod, + # or accept that /.convert serves 503 (the rest of zddc- + # server still works fine without conversion). + securityContext: + capabilities: + add: ["SYS_ADMIN"] + # cap-add SYS_ADMIN alone isn't enough — see the + # zddc/runtime/zddc-sandbox-exec docstring for the full + # set of LSM relaxations required. K8s 1.30+ supports + # specifying seccompProfile + appArmorProfile fields; + # if your cluster is older, you'll need annotations: + # container.apparmor.security.beta.kubernetes.io/zddc-server: unconfined + seccompProfile: + type: Unconfined + appArmorProfile: + type: Unconfined ports: - name: http containerPort: 8080 diff --git a/zddc/cmd/zddc-server/main.go b/zddc/cmd/zddc-server/main.go index 7d28625..4195281 100644 --- a/zddc/cmd/zddc-server/main.go +++ b/zddc/cmd/zddc-server/main.go @@ -87,29 +87,24 @@ func main() { "addr", cfg.Addr, "embedded_apps", embeddedVersionsForLog(embedded)) - // Probe the container runtime for the MD→{docx,html,pdf} endpoint. - // Non-fatal: if the host has no podman/docker (or the remote - // socket is unreachable in sidecar mode), conversion requests - // return 503 and everything else keeps working. The probe installs - // the package-level Runner when an engine is found; the configured - // Sandbox probe order is bwrap → podman → docker. The - // production-default bwrap engine reads the binary names below - // (pandoc + chromium are baked into the zddc-server image); - // the legacy OCI engines read the image refs and pull them - // lazily on first conversion via `--pull=missing`. The probe - // installs whichever runner the engine resolves to. + // Probe pandoc + chromium for the MD→{docx,html,pdf} endpoint. + // Non-fatal: if either binary isn't on PATH (operator running + // zddc-server outside the runtime image), conversion requests + // return 503 and everything else keeps working. // - // SetRemoteURL + SetScratchDir must run BEFORE Probe so the - // OCI-engine path can hit the sidecar socket when one is - // configured; bwrap ignores both. - convert.SetImages(cfg.ConvertPandocImage, cfg.ConvertChromiumImage) + // In the production runtime image, "pandoc" and "chromium-browser" + // on PATH resolve to wrapper scripts at /usr/local/bin/ + // that put the real binary into a cgroup v2 + bwrap sandbox + // before exec'ing it. zddc-server is unaware — it just sees + // the corresponding tool's behavior. The wrapper reads + // ZDDC_CONV_MEM_MAX, ZDDC_CONV_PIDS_MAX, and ZDDC_SCRATCH from + // the child env to drive cgroup setup + scratch-dir bind mount. convert.SetBinaries(cfg.ConvertPandocBinary, cfg.ConvertChromiumBinary) - convert.SetRemoteURL(cfg.ConvertPodmanSocket) convert.SetScratchDir(cfg.ConvertScratchDir) probeCtx, probeCancel := context.WithTimeout(context.Background(), 5*time.Second) - convert.Probe(probeCtx, cfg.ConvertEngine) + convert.Probe(probeCtx) probeCancel() - convert.ConfigureLimits(cfg.ConvertMemMiB, cfg.ConvertCPUs, cfg.ConvertPIDs, cfg.ConvertTimeout) + convert.ConfigureLimits(cfg.ConvertMemMiB, cfg.ConvertPIDs, cfg.ConvertTimeout) // Client mode short-circuit: when cfg.Upstream is set, this binary // runs as a downstream proxy/cache/mirror rather than a master. diff --git a/zddc/internal/config/config.go b/zddc/internal/config/config.go index 26621ee..54c6da5 100644 --- a/zddc/internal/config/config.go +++ b/zddc/internal/config/config.go @@ -48,26 +48,18 @@ type Config struct { ArchiveRescanInterval time.Duration // --archive-rescan-interval / ZDDC_ARCHIVE_RESCAN_INTERVAL — periodic full re-walk of the archive index. Covers SMB/CIFS where inotify misses cross-client writes. Default 60s; 0 to disable. // MD→{docx,html,pdf} conversion endpoint (see internal/convert). - // The server shells out to upstream pandoc + chromium container - // images via podman or docker, pulling each on first use via - // production default. The engine probe order is bwrap → podman → - // docker; the first one found on PATH wins. bwrap runs the - // pandoc + chromium binaries baked into the zddc-server image - // in a per-call Linux-namespace sandbox (no daemon, no socket, - // no OCI image pull). podman/docker are legacy fallbacks for - // hosts that already have a container engine and want OCI-image - // isolation per conversion. - ConvertPandocImage string // --convert-pandoc-image / ZDDC_CONVERT_PANDOC_IMAGE — image for MD→DOCX/HTML when the OCI engine is selected. Default docker.io/pandoc/latex:latest. - ConvertChromiumImage string // --convert-chromium-image / ZDDC_CONVERT_CHROMIUM_IMAGE — image for HTML→PDF when the OCI engine is selected. Default docker.io/zenika/alpine-chrome:latest. - ConvertPandocBinary string // --convert-pandoc-binary / ZDDC_CONVERT_PANDOC_BINARY — pandoc binary name (PATH-resolved) when the bwrap engine is selected. Default "pandoc". - ConvertChromiumBinary string // --convert-chromium-binary / ZDDC_CONVERT_CHROMIUM_BINARY — chromium binary name (PATH-resolved) when the bwrap engine is selected. Default "chromium-browser" (alpine); set to "chromium" on debian. - ConvertEngine string // --convert-engine / ZDDC_CONVERT_ENGINE — override sandbox binary (default: probe for bwrap, then podman, then docker). - ConvertPodmanSocket string // --convert-podman-socket / ZDDC_CONVERT_PODMAN_SOCKET — when non-empty, run podman in remote mode against this Unix socket (e.g. unix:///var/run/podman/podman.sock). Used with the Kubernetes sidecar pattern so zddc-server's own pod stays unprivileged. - ConvertScratchDir string // --convert-scratch-dir / ZDDC_CONVERT_SCRATCH_DIR — directory used for per-conversion scratch (template + HTML/PDF intermediates). Must be a path the remote podman can see at the same path. Empty = use $TMPDIR (local-mode default). - ConvertMemMiB int // --convert-mem-mib / ZDDC_CONVERT_MEM_MIB — per-container memory cap in MiB. Default 512. - ConvertCPUs string // --convert-cpus / ZDDC_CONVERT_CPUS — per-container CPU limit. Default "2". - ConvertPIDs int // --convert-pids / ZDDC_CONVERT_PIDS — per-container PID limit. Default 100. - ConvertTimeout time.Duration // --convert-timeout / ZDDC_CONVERT_TIMEOUT — per-conversion wall clock. Default 30s. + // zddc-server exec's `pandoc` and `chromium-browser` directly. + // In the production runtime image those names resolve to wrapper + // scripts at /usr/local/bin/ that put the real binary into a + // cgroup v2 + bubblewrap sandbox before exec'ing it — see + // zddc/runtime.Containerfile + zddc/runtime/zddc-sandbox-exec. + // zddc-server is unaware of sandboxing; the image owns it. + ConvertPandocBinary string // --convert-pandoc-binary / ZDDC_CONVERT_PANDOC_BINARY — pandoc binary name (PATH-resolved) or absolute path. Default "pandoc". Resolves to the wrapper script in the runtime image. + ConvertChromiumBinary string // --convert-chromium-binary / ZDDC_CONVERT_CHROMIUM_BINARY — chromium binary name (PATH-resolved) or absolute path. Default "chromium-browser" (alpine); set to "chromium" on debian. + ConvertScratchDir string // --convert-scratch-dir / ZDDC_CONVERT_SCRATCH_DIR — directory used for per-conversion scratch (template + HTML/PDF intermediates). The wrapper bind-mounts this into the sandbox at the same path. Empty = use $TMPDIR. + ConvertMemMiB int // --convert-mem-mib / ZDDC_CONVERT_MEM_MIB — per-conversion memory cap in MiB (advisory; passed to the wrapper via ZDDC_CONV_MEM_MAX, applied as cgroup v2 memory.max). Default 1024. + ConvertPIDs int // --convert-pids / ZDDC_CONVERT_PIDS — per-conversion PID cap (passed to the wrapper via ZDDC_CONV_PIDS_MAX, applied as cgroup v2 pids.max). Default 256. + ConvertTimeout time.Duration // --convert-timeout / ZDDC_CONVERT_TIMEOUT — per-conversion wall clock (enforced in zddc-server via context.WithTimeout). Default 60s. } // ErrHelpRequested is returned by Load when --help is passed; the caller @@ -146,28 +138,18 @@ func Load(args []string) (Config, error) { "Maximum PUT body size in bytes for the file API. Default 256 MiB. Larger requests are rejected with 413.") archiveRescanIntervalFlag := fs.Duration("archive-rescan-interval", parseDurationOrDefault(os.Getenv("ZDDC_ARCHIVE_RESCAN_INTERVAL"), 60*time.Second), "Periodic full re-walk of the archive index. Required on SMB/CIFS-backed roots where inotify misses cross-client writes. Default 60s; set 0 to disable.") - convertPandocImageFlag := fs.String("convert-pandoc-image", getEnv("ZDDC_CONVERT_PANDOC_IMAGE", "docker.io/pandoc/latex:latest"), - "Pandoc OCI image for MD→DOCX / MD→HTML, used only when the OCI engine (podman/docker) is selected. Pulled on first use via --pull=missing.") - convertChromiumImageFlag := fs.String("convert-chromium-image", getEnv("ZDDC_CONVERT_CHROMIUM_IMAGE", "docker.io/zenika/alpine-chrome:latest"), - "Chromium OCI image for HTML→PDF, used only when the OCI engine is selected. Pulled on first use via --pull=missing.") convertPandocBinaryFlag := fs.String("convert-pandoc-binary", getEnv("ZDDC_CONVERT_PANDOC_BINARY", "pandoc"), - "Pandoc binary name (PATH-resolved) when the bwrap engine is selected. Default \"pandoc\".") + "Pandoc binary name (PATH-resolved) or absolute path. Default \"pandoc\". In the runtime image this resolves to the wrapper at /usr/local/bin/pandoc which sandboxes the real binary.") convertChromiumBinaryFlag := fs.String("convert-chromium-binary", getEnv("ZDDC_CONVERT_CHROMIUM_BINARY", "chromium-browser"), - "Chromium binary name (PATH-resolved) when the bwrap engine is selected. Default \"chromium-browser\" (alpine); set to \"chromium\" on debian/ubuntu.") - convertEngineFlag := fs.String("convert-engine", os.Getenv("ZDDC_CONVERT_ENGINE"), - "Conversion sandbox override (default: probe for bwrap, then podman, then docker).") - convertPodmanSocketFlag := fs.String("convert-podman-socket", os.Getenv("ZDDC_CONVERT_PODMAN_SOCKET"), - "Run podman in remote mode against this Unix socket URL (e.g. unix:///var/run/podman/podman.sock). When set, the engine binary is invoked as `podman --remote --url= run …`; the actual container creation happens in whatever process owns the socket (typically a podman-system-service sidecar). Empty = local mode.") + "Chromium binary name (PATH-resolved) or absolute path. Default \"chromium-browser\" (alpine); set to \"chromium\" on debian/ubuntu.") convertScratchDirFlag := fs.String("convert-scratch-dir", os.Getenv("ZDDC_CONVERT_SCRATCH_DIR"), - "Scratch directory for per-conversion intermediates (template, HTML, PDF). In remote mode this MUST be a path that the podman-service side can see at the same path — typically a shared emptyDir mounted at the same mountPath in both containers. Empty = use $TMPDIR (local mode).") - convertMemMiBFlag := fs.Int("convert-mem-mib", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_MEM_MIB"), 512), - "Per-conversion container memory limit in MiB. Default 512.") - convertCPUsFlag := fs.String("convert-cpus", getEnv("ZDDC_CONVERT_CPUS", "2"), - "Per-conversion container CPU limit (passed to --cpus). Default 2.") - convertPIDsFlag := fs.Int("convert-pids", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_PIDS"), 100), - "Per-conversion container PID limit. Default 100.") - convertTimeoutFlag := fs.Duration("convert-timeout", parseDurationOrDefault(os.Getenv("ZDDC_CONVERT_TIMEOUT"), 30*time.Second), - "Per-conversion wall-clock timeout. Default 30s.") + "Scratch directory for per-conversion intermediates (template, HTML, PDF). The runtime image's wrapper bind-mounts this into the sandbox at the same path. Empty = use $TMPDIR.") + convertMemMiBFlag := fs.Int("convert-mem-mib", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_MEM_MIB"), 1024), + "Per-conversion memory limit in MiB (advisory; passed to the runtime-image wrapper via ZDDC_CONV_MEM_MAX, applied as cgroup v2 memory.max). Default 1024.") + convertPIDsFlag := fs.Int("convert-pids", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_PIDS"), 256), + "Per-conversion PID limit (passed to the runtime-image wrapper via ZDDC_CONV_PIDS_MAX, applied as cgroup v2 pids.max). Default 256.") + convertTimeoutFlag := fs.Duration("convert-timeout", parseDurationOrDefault(os.Getenv("ZDDC_CONVERT_TIMEOUT"), 60*time.Second), + "Per-conversion wall-clock timeout (enforced in zddc-server via context.WithTimeout). Default 60s.") accessLogFlag := fs.String("access-log", os.Getenv("ZDDC_ACCESS_LOG"), "Tee structured access logs to this file (JSON, size-rotated). "+ "Default: /.zddc.d/logs/access-.log. "+ @@ -239,17 +221,12 @@ func Load(args []string) (Config, error) { AppsPubKey: *appsPubKeyFlag, MaxWriteBytes: *maxWriteBytesFlag, ArchiveRescanInterval: *archiveRescanIntervalFlag, - ConvertPandocImage: *convertPandocImageFlag, - ConvertChromiumImage: *convertChromiumImageFlag, ConvertPandocBinary: *convertPandocBinaryFlag, ConvertChromiumBinary: *convertChromiumBinaryFlag, - ConvertEngine: *convertEngineFlag, - ConvertPodmanSocket: *convertPodmanSocketFlag, - ConvertScratchDir: *convertScratchDirFlag, - ConvertMemMiB: *convertMemMiBFlag, - ConvertCPUs: *convertCPUsFlag, - ConvertPIDs: *convertPIDsFlag, - ConvertTimeout: *convertTimeoutFlag, + ConvertScratchDir: *convertScratchDirFlag, + ConvertMemMiB: *convertMemMiBFlag, + ConvertPIDs: *convertPIDsFlag, + ConvertTimeout: *convertTimeoutFlag, } // Default Root to the current working directory. diff --git a/zddc/internal/convert/convert.go b/zddc/internal/convert/convert.go index 767b47c..a2d254a 100644 --- a/zddc/internal/convert/convert.go +++ b/zddc/internal/convert/convert.go @@ -1,20 +1,15 @@ // Package convert turns a markdown source byte-buffer into DOCX, HTML, -// or PDF. Pandoc handles MD↔DOCX and MD→HTML; headless Chromium handles -// HTML→PDF. Each conversion runs inside an isolating sandbox so an -// untrusted source-markdown can't reach the host's filesystem or -// network even if it drives the binary to RCE. +// or PDF by exec'ing pandoc and chromium-browser. Each conversion runs +// inside a sandbox provided by the IMAGE — typically a wrapper script +// at /usr/local/bin/ that puts the real binary into a cgroup +// v2 + bubblewrap sandbox before exec'ing it. See +// zddc/runtime.Containerfile for the production setup. // -// Engine probe order (call Probe once at startup, first hit wins): -// -// 1. bwrap (production default). Runs the pandoc/chromium binaries -// baked into the zddc-server runtime image directly under -// bubblewrap: namespace-isolated, no network, read-only /usr, a -// 256 MiB tmpfs /tmp, minimal proc/dev. Configure binary names -// via SetBinaries; defaults are `pandoc` and `chromium-browser`. -// 2. podman / docker (legacy fallback). Runs each conversion inside -// an OCI container pulled lazily via `--pull=missing`. Defaults -// `docker.io/pandoc/latex:latest` + `docker.io/zenika/alpine- -// chrome:latest`; configure via SetImages. +// zddc-server's Go code is unaware of sandboxing: it just exec's +// "pandoc" or "chromium-browser" and gets the corresponding tool's +// behavior back. Operators who want a different isolation strategy +// (firejail, systemd-nspawn, podman-run, raw exec for dev) replace +// the wrapper script in their image; the Go binary doesn't change. // // Public surface: // @@ -22,16 +17,13 @@ // ToHTML(ctx, source, meta) → []byte (standalone HTML) // ToPDF (ctx, source, meta) → []byte (PDF, via HTML + chromium) // -// Probe(ctx, override) → Capabilities (call once at startup) +// Probe(ctx) → Capabilities (call once at startup) // Available() → (Capabilities, bool) -// SetImages(pandoc, chromium) — install OCI image refs from config -// SetBinaries(pandoc, chromium) — install bwrap binary names from config +// SetBinaries(pandoc, chromium) — install binary names from config +// SetScratchDir(dir) — install scratch root from config // // All three converters are safe for concurrent use; each call gets a -// fresh sandbox. The pandoc binary (or pandoc/latex image's entrypoint) -// reads pandoc flags directly; the chromium binary (or alpine-chrome -// image's entrypoint) reads chromium-browser flags. No `sh -c` -// wrappers, no shell quoting. +// fresh scratch dir + (image-provided) sandbox. // // Metadata maps to the placeholders consumed by viewer-template.html. // title/tracking_number/revision/status/is_draft typically come from @@ -66,55 +58,33 @@ type Metadata struct { NoTOC bool } -// Default tool refs. The bwrap engine (default since v0.0.x) reads the -// Binary fields below; the legacy containerRunner reads the Image -// fields. The convert entry points populate both into a ToolSpec so -// whichever engine is installed picks the field it needs. +// Default binary names. The runtime image installs WRAPPER scripts at +// /usr/local/bin/pandoc and /usr/local/bin/chromium-browser (shadowing +// the real binaries in /usr/bin/) so these names resolve through the +// sandbox automatically. Operators running zddc-server outside the +// runtime image with raw binaries on PATH still get a working +// conversion endpoint — just without the per-call sandbox. // -// pandoc/latex carries TeX Live for native PDF too, so the image is a -// superset of pandoc/core. The bwrap engine doesn't pay that cost — -// each binary is installed from the host's package manager (alpine: -// pandoc-cli + chromium) and the image grows by ≈ 200 MB once. +// Alpine's chromium package installs the binary as "chromium-browser"; +// debian/ubuntu ships "chromium". Operators override via +// --convert-chromium-binary when the package on their image differs. const ( - DefaultPandocImage = "docker.io/pandoc/latex:latest" - DefaultChromiumImage = "docker.io/zenika/alpine-chrome:latest" - DefaultPandocBinary = "pandoc" - // Alpine's chromium package installs the binary as "chromium-browser". - // Debian/Ubuntu ships "chromium". Operators override via - // --convert-chromium-binary when the package on their image differs. + DefaultPandocBinary = "pandoc" DefaultChromiumBinary = "chromium-browser" ) var ( - pandocImage atomic.Pointer[string] - chromiumImage atomic.Pointer[string] pandocBinary atomic.Pointer[string] chromiumBinary atomic.Pointer[string] scratchDir atomic.Pointer[string] ) -// SetImages installs the OCI image refs used by the legacy -// containerRunner engine. The bwrap engine ignores these and reads -// the binary names installed via SetBinaries instead. Empty values -// keep the previous setting (or the DefaultPandocImage / -// DefaultChromiumImage constants on first call). Called from -// cmd/zddc-server/main.go after flag parsing. -func SetImages(pandoc, chromium string) { - if pandoc != "" { - s := pandoc - pandocImage.Store(&s) - } - if chromium != "" { - s := chromium - chromiumImage.Store(&s) - } -} - -// SetBinaries installs the host-binary names used by the bwrap engine. -// Empty values keep the previous setting (or the DefaultPandocBinary / +// SetBinaries installs the binary names used by Probe/Run. Empty +// values keep the previous setting (or the DefaultPandocBinary / // DefaultChromiumBinary constants on first call). The values are -// PATH-resolved names (e.g. "pandoc", "chromium-browser") or absolute -// paths. Called from cmd/zddc-server/main.go after flag parsing. +// PATH-resolved names (e.g. "pandoc", "chromium-browser") or +// absolute paths. Called from cmd/zddc-server/main.go after flag +// parsing. func SetBinaries(pandoc, chromium string) { if pandoc != "" { s := pandoc @@ -126,12 +96,11 @@ func SetBinaries(pandoc, chromium string) { } } -// SetScratchDir installs the host-side scratch root used for per-call -// intermediates (template, HTML, PDF). Empty means "use $TMPDIR" — the -// local-mode default. In remote mode this MUST be a path the podman- -// service sidecar can see at the same mountpoint, typically a shared -// emptyDir mounted at /work in both containers. Called from -// cmd/zddc-server/main.go after flag parsing. +// SetScratchDir installs the host-side scratch root used for +// per-call intermediates (template, HTML, PDF). Empty means "use +// $TMPDIR". The runtime-image wrapper bind-mounts the per-call +// scratch dir into its sandbox at the same path, so any path under +// this root works. func SetScratchDir(dir string) { s := dir scratchDir.Store(&s) @@ -144,20 +113,6 @@ func currentScratchDir() string { return "" } -func currentPandocImage() string { - if p := pandocImage.Load(); p != nil && *p != "" { - return *p - } - return DefaultPandocImage -} - -func currentChromiumImage() string { - if p := chromiumImage.Load(); p != nil && *p != "" { - return *p - } - return DefaultChromiumImage -} - func currentPandocBinary() string { if p := pandocBinary.Load(); p != nil && *p != "" { return *p @@ -172,20 +127,10 @@ func currentChromiumBinary() string { return DefaultChromiumBinary } -// pandocTool / chromiumTool build the ToolSpec passed to Runner.Run. -// Both fields are populated so whichever engine is installed picks -// the one it needs (bwrap reads Binary; containerRunner reads Image). -func pandocTool() ToolSpec { - return ToolSpec{Image: currentPandocImage(), Binary: currentPandocBinary()} -} - -func chromiumTool() ToolSpec { - return ToolSpec{Image: currentChromiumImage(), Binary: currentChromiumBinary()} -} - -// ToDocx renders source markdown to DOCX bytes. One container run via -// the pandoc image. Caller passes the full file content (envelope + -// body); pandoc handles `markdown+yaml_metadata_block` natively. +// ToDocx renders source markdown to DOCX bytes. Single pandoc exec; +// no scratch dir needed (stdin → stdout). The caller passes the +// full file content (envelope + body); pandoc handles +// `markdown+yaml_metadata_block` natively. func ToDocx(ctx context.Context, source []byte, m Metadata) ([]byte, error) { r := currentRunner() if r == nil { @@ -198,13 +143,14 @@ func ToDocx(ctx context.Context, source []byte, m Metadata) ([]byte, error) { } cmd = append(cmd, metadataArgs(m)...) cmd = append(cmd, "-") - return r.Run(ctx, pandocTool(), source, nil, cmd) + return r.Run(ctx, currentPandocBinary(), source, "", cmd) } // ToHTML renders source markdown to standalone HTML using // viewer-template.html. Embeds CSS + images via --embed-resources. -// Template + custom.css are bind-mounted into the container at /tpl -// from a per-call scratch dir. +// Template + custom.css live in a per-call scratch dir; the host +// path is passed via ZDDC_SCRATCH so the wrapper bind-mounts it +// into the sandbox at the same path. func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) { r := currentRunner() if r == nil { @@ -216,6 +162,7 @@ func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) { } defer os.RemoveAll(scratch) + tplPath := filepath.Join(scratch, "viewer-template.html") cmd := []string{ "--from=markdown+yaml_metadata_block", "--to=html5", @@ -224,29 +171,27 @@ func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) { "--section-divs", "--id-prefix=", "--html-q-tags", - "--template=/tpl/viewer-template.html", + "--template=" + tplPath, } if !m.NoTOC { cmd = append(cmd, "--toc", "--toc-depth=6") } cmd = append(cmd, metadataArgs(m)...) cmd = append(cmd, "--output=-", "-") - - mounts := []string{scratch + ":/tpl:ro"} - return r.Run(ctx, pandocTool(), source, mounts, cmd) + return r.Run(ctx, currentPandocBinary(), source, scratch, cmd) } -// ToPDF renders source markdown to PDF in two stages: pandoc produces -// HTML using viewer-template.html (stage 1, pandoc image), then headless -// Chromium prints that HTML to PDF (stage 2, chromium image). The -// two-stage choice preserves the print-media CSS already authored in -// viewer-template.html — pandoc's native --pdf-engine path uses LaTeX +// ToPDF renders source markdown to PDF in two stages: pandoc +// produces HTML using viewer-template.html (stage 1), then headless +// chromium prints that HTML to PDF (stage 2). The two-stage choice +// preserves the print-media CSS already authored in viewer- +// template.html — pandoc's native --pdf-engine path uses LaTeX // which would bypass it entirely. // -// Chromium runs from the alpine-chrome image whose entrypoint is -// `chromium-browser`; our cmd is the flag list passed straight to that -// binary. The host scratch dir is bind-mounted read-write at /pdf so -// chromium can write out.pdf and we read it back afterward. +// Both stages share a single per-call scratch dir: pandoc writes +// `in.html` and chromium reads it, then chromium writes `out.pdf` +// which the host reads back. The wrapper bind-mounts the scratch +// dir read-write into the sandbox at the same path. func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) { html, err := ToHTML(ctx, source, m) if err != nil { @@ -271,17 +216,11 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) { return nil, err } - mounts := []string{scratch + ":/pdf:rw"} - // alpine-chrome's entrypoint is `chromium-browser`. --no-sandbox is - // required because the container drops CAP_SYS_ADMIN; the threat - // model is "malicious markdown drives chromium RCE", contained by - // --network=none + --cap-drop=ALL + --read-only + tmpfs. - // - // --disable-dev-shm-usage: without this, chromium tries to allocate - // shared memory under /dev/shm, which our --read-only container - // can't write to. The flag tells chromium to fall back to /tmp, - // which is a writable tmpfs (sized in runner.go). Standard fix for - // chromium-in-container; required by every CI/headless setup. + // --no-sandbox: the wrapper provides the sandbox; chromium's + // own setuid sandbox would conflict (and fails inside our + // user-namespace anyway). --disable-dev-shm-usage: chromium's + // shared-memory fallback writes to /dev/shm which our sandbox + // doesn't expose; redirect to /tmp (the wrapper's tmpfs). cmd := []string{ "--headless", "--disable-gpu", @@ -290,10 +229,10 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) { "--user-data-dir=/tmp/chrome", "--no-pdf-header-footer", "--virtual-time-budget=10000", - "--print-to-pdf=/pdf/out.pdf", - "file:///pdf/in.html", + "--print-to-pdf=" + pdfPath, + "file://" + htmlPath, } - if _, err := r.Run(ctx, chromiumTool(), nil, mounts, cmd); err != nil { + if _, err := r.Run(ctx, currentChromiumBinary(), nil, scratch, cmd); err != nil { return nil, err } @@ -303,7 +242,7 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) { } if len(out) < 4 || string(out[:4]) != "%PDF" { return nil, &ConvertError{ - Tool: "chromium", + Tool: currentChromiumBinary(), ExitCode: 0, Stderr: "chromium did not produce a valid PDF", Cause: fmt.Errorf("invalid PDF magic in output (got %d bytes)", len(out)), @@ -312,9 +251,9 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) { return out, nil } -// metadataArgs renders Metadata into pandoc -V flags. Order is stable -// so test fixtures don't churn. Empty values are omitted (the template -// uses $if(...)$ blocks). +// metadataArgs renders Metadata into pandoc -V flags. Order is +// stable so test fixtures don't churn. Empty values are omitted +// (the template uses $if(...)$ blocks). func metadataArgs(m Metadata) []string { var out []string add := func(k, v string) { diff --git a/zddc/internal/convert/convert_test.go b/zddc/internal/convert/convert_test.go index 0c41e39..30b6d17 100644 --- a/zddc/internal/convert/convert_test.go +++ b/zddc/internal/convert/convert_test.go @@ -10,25 +10,25 @@ import ( ) // fakeRunner records the args it was invoked with and replays canned -// responses. Lets us assert the command lines + image refs without -// needing podman. +// responses. Lets us assert command lines + binary refs + scratch +// dirs without needing actual pandoc. type fakeRunner struct { - mu sync.Mutex - calls [][]string - tools []ToolSpec - stdin [][]byte - mounts [][]string - resp []byte - err error + mu sync.Mutex + calls [][]string + binaries []string + stdin [][]byte + scratchDir []string + resp []byte + err error } -func (f *fakeRunner) Run(_ context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) { +func (f *fakeRunner) Run(_ context.Context, binary string, stdin []byte, scratchDir string, cmd []string) ([]byte, error) { f.mu.Lock() defer f.mu.Unlock() f.calls = append(f.calls, append([]string(nil), cmd...)) - f.tools = append(f.tools, tool) + f.binaries = append(f.binaries, binary) f.stdin = append(f.stdin, append([]byte(nil), stdin...)) - f.mounts = append(f.mounts, append([]string(nil), mounts...)) + f.scratchDir = append(f.scratchDir, scratchDir) return f.resp, f.err } @@ -38,14 +38,14 @@ func (f *fakeRunner) lastCall() (string, []string) { if len(f.calls) == 0 { return "", nil } - return f.tools[len(f.tools)-1].Image, f.calls[len(f.calls)-1] + return f.binaries[len(f.binaries)-1], f.calls[len(f.calls)-1] } -func TestToDocx_UsesPandocImage(t *testing.T) { +func TestToDocx_UsesPandocBinary(t *testing.T) { f := &fakeRunner{resp: []byte("FAKE-DOCX")} InstallRunner(f) t.Cleanup(func() { InstallRunner(nil) }) - SetImages("docker.io/pandoc/latex:latest", "") + SetBinaries("pandoc", "chromium-browser") out, err := ToDocx(context.Background(), []byte("# Hello\n"), Metadata{ Title: "Hello", @@ -57,9 +57,9 @@ func TestToDocx_UsesPandocImage(t *testing.T) { if string(out) != "FAKE-DOCX" { t.Errorf("unexpected output: %q", out) } - image, call := f.lastCall() - if image != "docker.io/pandoc/latex:latest" { - t.Errorf("expected pandoc image, got %q", image) + binary, call := f.lastCall() + if binary != "pandoc" { + t.Errorf("expected pandoc binary, got %q", binary) } if !contains(call, "--to=docx") { t.Errorf("missing --to=docx: %v", call) @@ -74,35 +74,40 @@ func TestToDocx_UsesPandocImage(t *testing.T) { if call[len(call)-1] != "-" { t.Errorf("expected stdin marker as last arg, got %q", call[len(call)-1]) } + // ToDocx is stdin → stdout — no scratch dir needed. + if f.scratchDir[len(f.scratchDir)-1] != "" { + t.Errorf("ToDocx should not need a scratch dir, got %q", f.scratchDir[len(f.scratchDir)-1]) + } } -func TestToHTML_UsesTemplateAndMountsScratch(t *testing.T) { +func TestToHTML_UsesTemplateFromScratchDir(t *testing.T) { f := &fakeRunner{resp: []byte("fake")} InstallRunner(f) t.Cleanup(func() { InstallRunner(nil) }) - SetImages("docker.io/pandoc/latex:latest", "") + SetBinaries("pandoc", "chromium-browser") _, err := ToHTML(context.Background(), []byte("# Hi\n"), Metadata{Title: "Hi"}) if err != nil { t.Fatalf("ToHTML: %v", err) } - image, call := f.lastCall() - if image != "docker.io/pandoc/latex:latest" { - t.Errorf("expected pandoc image, got %q", image) + binary, call := f.lastCall() + if binary != "pandoc" { + t.Errorf("expected pandoc binary, got %q", binary) } - if !contains(call, "--template=/tpl/viewer-template.html") { - t.Errorf("template flag missing: %v", call) + // Template flag must reference an absolute path under the scratch + // dir (no /tpl indirection anymore — the wrapper bind-mounts the + // scratch dir at its own path, so absolute host paths just work). + scratch := f.scratchDir[len(f.scratchDir)-1] + if scratch == "" { + t.Fatalf("ToHTML must pass a scratch dir to the runner") + } + wantTpl := "--template=" + scratch + "/viewer-template.html" + if !contains(call, wantTpl) { + t.Errorf("template flag missing/wrong; want %q in %v", wantTpl, call) } if !contains(call, "--toc") { t.Errorf("TOC flag missing (default NoTOC=false): %v", call) } - if len(f.mounts) == 0 || len(f.mounts[0]) == 0 { - t.Fatalf("expected at least one bind mount for /tpl") - } - mount := f.mounts[0][0] - if !strings.Contains(mount, ":/tpl:") { - t.Errorf("mount missing /tpl: %q", mount) - } } func TestToHTML_NoTOCSuppressesTOC(t *testing.T) { @@ -120,9 +125,9 @@ func TestToHTML_NoTOCSuppressesTOC(t *testing.T) { } } -// recordingRunner records every call and returns canned responses -// in sequence. Lets ToPDF tests assert the two-stage pipeline -// (pandoc image then chromium image). +// recordingRunner records every call and returns canned responses in +// sequence. Lets ToPDF tests assert the two-stage pipeline (pandoc +// then chromium). type recordingRunner struct { mu sync.Mutex calls []recordedCall @@ -132,18 +137,18 @@ type recordingRunner struct { } type recordedCall struct { - image string - cmd []string - mounts []string + binary string + cmd []string + scratch string } -func (r *recordingRunner) Run(_ context.Context, tool ToolSpec, _ []byte, mounts []string, cmd []string) ([]byte, error) { +func (r *recordingRunner) Run(_ context.Context, binary string, _ []byte, scratch string, cmd []string) ([]byte, error) { r.mu.Lock() defer r.mu.Unlock() r.calls = append(r.calls, recordedCall{ - image: tool.Image, - cmd: append([]string(nil), cmd...), - mounts: append([]string(nil), mounts...), + binary: binary, + cmd: append([]string(nil), cmd...), + scratch: scratch, }) if r.cursor >= len(r.resp) { return nil, nil @@ -169,57 +174,63 @@ func TestScratchDir_UsedByToHTML(t *testing.T) { if err != nil { t.Fatalf("ToHTML: %v", err) } - if len(f.mounts) == 0 || len(f.mounts[0]) == 0 { - t.Fatalf("expected at least one mount") + if len(f.scratchDir) == 0 { + t.Fatalf("expected a scratch dir to be passed to the runner") } - mount := f.mounts[0][0] // ":/tpl:ro" - if !strings.HasPrefix(mount, scratchRoot+"/") { - t.Errorf("scratch dir not under configured root: %q (root=%q)", mount, scratchRoot) + got := f.scratchDir[0] + if !strings.HasPrefix(got, scratchRoot+"/") { + t.Errorf("scratch dir not under configured root: %q (root=%q)", got, scratchRoot) } } func TestToPDF_TwoStagePipeline(t *testing.T) { // Stage 1: pandoc emits HTML. Stage 2: chromium reads HTML from - // the bind mount and writes /pdf/out.pdf. The fake runner can't + // the scratch dir and writes out.pdf there. The fake runner can't // actually write the PDF, so we expect ToPDF to fail at the // read-back step — but we can still assert the two-stage call - // shape and the right image per stage. + // shape and the right binary per stage. r := &recordingRunner{ resp: [][]byte{ []byte("fake"), // stage 1 stdout - nil, // stage 2 stdout (chromium writes PDF to bind mount) + nil, // stage 2 stdout (chromium writes PDF to scratch) }, } InstallRunner(r) t.Cleanup(func() { InstallRunner(nil) }) - SetImages("docker.io/pandoc/latex:latest", "docker.io/zenika/alpine-chrome:latest") + SetBinaries("pandoc", "chromium-browser") _, err := ToPDF(context.Background(), []byte("# Hi\n"), Metadata{}) // PDF read-back will fail (fake runner didn't write the file) — - // that's expected for this test which only inspects the call - // shape. + // that's expected for this test which only inspects the call shape. if err == nil { t.Fatalf("expected error from PDF read-back; got nil") } if len(r.calls) != 2 { - t.Fatalf("expected 2 container calls (pandoc + chromium); got %d", len(r.calls)) + t.Fatalf("expected 2 calls (pandoc + chromium); got %d", len(r.calls)) } - if r.calls[0].image != "docker.io/pandoc/latex:latest" { - t.Errorf("stage 1 image: got %q want pandoc/latex", r.calls[0].image) + if r.calls[0].binary != "pandoc" { + t.Errorf("stage 1 binary: got %q want pandoc", r.calls[0].binary) } - if r.calls[1].image != "docker.io/zenika/alpine-chrome:latest" { - t.Errorf("stage 2 image: got %q want alpine-chrome", r.calls[1].image) + if r.calls[1].binary != "chromium-browser" { + t.Errorf("stage 2 binary: got %q want chromium-browser", r.calls[1].binary) } - // Stage 2 must include the --print-to-pdf flag pointing at /pdf. - if !contains(r.calls[1].cmd, "--print-to-pdf=/pdf/out.pdf") { - t.Errorf("chromium call missing --print-to-pdf flag: %v", r.calls[1].cmd) + // Stage 2 must include --print-to-pdf pointing at an absolute + // path under the scratch dir. + stage2 := r.calls[1] + if stage2.scratch == "" { + t.Fatalf("chromium call must have a scratch dir") } - if !contains(r.calls[1].cmd, "--no-sandbox") { - t.Errorf("chromium call missing --no-sandbox: %v", r.calls[1].cmd) + wantPDF := "--print-to-pdf=" + stage2.scratch + "/out.pdf" + if !contains(stage2.cmd, wantPDF) { + t.Errorf("chromium call missing --print-to-pdf=%s/out.pdf: %v", stage2.scratch, stage2.cmd) } - // Stage 2's bind mount must be writable (chromium writes the PDF). - if len(r.calls[1].mounts) == 0 || !strings.Contains(r.calls[1].mounts[0], ":rw") { - t.Errorf("chromium mount must be :rw, got %v", r.calls[1].mounts) + if !contains(stage2.cmd, "--no-sandbox") { + t.Errorf("chromium call missing --no-sandbox: %v", stage2.cmd) + } + // Stage 2 chromium reads file:///in.html. + wantHTML := "file://" + stage2.scratch + "/in.html" + if !contains(stage2.cmd, wantHTML) { + t.Errorf("chromium call missing file:// URL: %v", stage2.cmd) } } @@ -255,21 +266,6 @@ func TestMetadataArgs_OmitsEmptyAndOrdersStably(t *testing.T) { } } -func TestImageTag(t *testing.T) { - cases := map[string]string{ - "docker.io/pandoc/latex:latest": "pandoc/latex", - "docker.io/zenika/alpine-chrome:latest": "zenika/alpine-chrome", - "pandoc/core": "pandoc/core", - "quay.io/example/foo:v1": "example/foo", - "alpine": "alpine", - } - for in, want := range cases { - if got := imageTag(in); got != want { - t.Errorf("imageTag(%q) = %q, want %q", in, got, want) - } - } -} - func TestSingleflight_Collapses(t *testing.T) { var g singleflightGroup const N = 50 @@ -305,113 +301,3 @@ func contains(haystack []string, needle string) bool { } return false } - -// TestToolSpecPopulation: the convert entry points populate BOTH the -// Image and Binary fields of ToolSpec, so the runner-of-the-day can -// pick whichever it needs. bwrapRunner reads Binary; containerRunner -// reads Image; the call site doesn't know which is installed. -func TestToolSpecPopulation(t *testing.T) { - f := &fakeRunner{resp: []byte("ok")} - InstallRunner(f) - t.Cleanup(func() { InstallRunner(nil) }) - SetImages("docker.io/pandoc/latex:1.0", "docker.io/zenika/alpine-chrome:2.0") - SetBinaries("/opt/bin/pandoc", "/opt/bin/chromium") - t.Cleanup(func() { SetImages("", ""); SetBinaries("", "") }) - - if _, err := ToDocx(context.Background(), []byte("# x\n"), Metadata{}); err != nil { - t.Fatalf("ToDocx: %v", err) - } - if len(f.tools) != 1 { - t.Fatalf("want 1 tool call, got %d", len(f.tools)) - } - got := f.tools[0] - if got.Image != "docker.io/pandoc/latex:1.0" { - t.Errorf("Image = %q, want docker.io/pandoc/latex:1.0", got.Image) - } - if got.Binary != "/opt/bin/pandoc" { - t.Errorf("Binary = %q, want /opt/bin/pandoc", got.Binary) - } -} - -// TestBwrapArgs_SandboxFlagsPresent locks in the bwrap argv shape. -// Every conversion must run with these hardening flags — the whole -// point of bwrap-as-default is that the sandbox is built into every -// invocation. A refactor that drops any of them needs to fail this -// test loudly. -func TestBwrapArgs_SandboxFlagsPresent(t *testing.T) { - args, err := buildBwrapArgs("pandoc", nil, []string{"--from=markdown", "--to=docx", "-"}) - if err != nil { - t.Fatalf("buildBwrapArgs: %v", err) - } - mustHave := []string{ - "--unshare-all", // net + pid + ipc + uts + cgroup - "--unshare-user-try", // user-namespace when kernel allows - "--die-with-parent", // cleanup when zddc-server exits - "--proc", // minimal /proc - "--dev", // minimal /dev - "--tmpfs", // writable /tmp scratch - "--clearenv", // no host env leaks - } - for _, flag := range mustHave { - if !contains(args, flag) { - t.Errorf("bwrap args missing sandbox flag %q: %v", flag, args) - } - } - // /usr must be bind-mounted read-only — that's how the binary - // + its dynamic libs are visible inside the sandbox. The - // "--ro-bind /usr /usr" triple must appear consecutively. - if i := indexOfTriple(args, "--ro-bind", "/usr", "/usr"); i < 0 { - t.Errorf("bwrap args missing --ro-bind /usr /usr: %v", args) - } - // Binary + caller-cmd come last, in order. - last := args[len(args)-4:] - want := []string{"pandoc", "--from=markdown", "--to=docx", "-"} - for i, w := range want { - if last[i] != w { - t.Errorf("trailing args[%d] = %q, want %q", i, last[i], w) - } - } -} - -// TestBwrapArgs_MountTranslation: caller "host:target:ro" → bwrap -// "--ro-bind host target"; "host:target:rw" → "--bind host target"; -// no mode segment defaults to ro (mirroring containerRunner). -func TestBwrapArgs_MountTranslation(t *testing.T) { - args, err := buildBwrapArgs("pandoc", - []string{"/host/tpl:/tpl:ro", "/host/pdf:/pdf:rw", "/host/x:/x"}, - nil) - if err != nil { - t.Fatalf("buildBwrapArgs: %v", err) - } - if i := indexOfTriple(args, "--ro-bind", "/host/tpl", "/tpl"); i < 0 { - t.Errorf("missing --ro-bind /host/tpl /tpl: %v", args) - } - if i := indexOfTriple(args, "--bind", "/host/pdf", "/pdf"); i < 0 { - t.Errorf("missing --bind /host/pdf /pdf: %v", args) - } - if i := indexOfTriple(args, "--ro-bind", "/host/x", "/x"); i < 0 { - t.Errorf("missing default-ro --ro-bind /host/x /x: %v", args) - } -} - -// TestBwrapArgs_RejectsBadMountSpec: a malformed mount string fails -// fast, never reaches exec. Single-segment specs (no target) and -// unknown modes both qualify. -func TestBwrapArgs_RejectsBadMountSpec(t *testing.T) { - for _, bad := range []string{"only-host", "/h:/t:weird", ""} { - if _, err := buildBwrapArgs("pandoc", []string{bad}, nil); err == nil { - t.Errorf("expected error for malformed mount %q", bad) - } - } -} - -// indexOfTriple returns the index of `a` in args such that -// args[i:i+3] == {a, b, c}, or -1. -func indexOfTriple(args []string, a, b, c string) int { - for i := 0; i+2 < len(args); i++ { - if args[i] == a && args[i+1] == b && args[i+2] == c { - return i - } - } - return -1 -} diff --git a/zddc/internal/convert/health.go b/zddc/internal/convert/health.go index 910ffd0..cce7c4c 100644 --- a/zddc/internal/convert/health.go +++ b/zddc/internal/convert/health.go @@ -11,51 +11,45 @@ import ( "time" ) -// remoteURL is set by Probe from cfg.ConvertPodmanSocket. Empty means -// local mode. -var remoteURL atomic.Pointer[string] - -// Capabilities is the snapshot of "can we convert right now?". The -// only hard requirement is a container runtime reachable from -// zddc-server — image presence is left to `--pull=missing` at -// conversion time, so a missing image surfaces as a normal -// ConvertError (not a probe failure). +// Capabilities is the snapshot the convert-health endpoint reports +// and the convert entry points consult before exec'ing. // -// Mode applies to OCI engines (podman/docker): "local" when the -// engine creates containers in the same process as zddc-server, -// "remote" when zddc-server is the client of a podman-system-service -// sidecar. The bwrap engine has no mode (always direct exec). +// In the runtime-image model, "Ready" means both binaries +// (pandoc + chromium) are present on PATH. Sandboxing + resource +// limits live in the wrapper scripts that PATH resolves to — out +// of zddc-server's concern. The probe doesn't try to validate +// those; if the wrapper is broken, the first conversion surfaces +// the failure as a ConvertError with the wrapper's stderr. type Capabilities struct { - Engine string // "bwrap" | "podman" | "docker" | "" - EngineVer string // first line of " --version" - Mode string // "local" or "remote" (OCI engines only) - RemoteURL string // populated in remote mode (OCI engines only) - PandocImage string // resolved pandoc image ref (OCI engines) - ChromiumImage string // resolved chromium image ref (OCI engines) - ProbedAt time.Time - Err error + PandocBinary string // resolved path, e.g. /usr/local/bin/pandoc + PandocVersion string // first line of "pandoc --version" + ChromiumBinary string // resolved path, e.g. /usr/local/bin/chromium-browser + ChromiumVersion string // first line of "chromium-browser --version" + ProbedAt time.Time + Err error } -// Ready reports whether conversions can be attempted. The first -// conversion may still fail if the configured binary or image isn't -// actually present (the runner will surface a clear error from the -// child process's stderr). +// Ready reports whether conversions can be attempted. func (c Capabilities) Ready() bool { - return c.Engine != "" && c.Err == nil + return c.PandocBinary != "" && c.ChromiumBinary != "" && c.Err == nil } // Reason returns a short human-friendly explanation when Ready() is // false. Used as the body of a 503. func (c Capabilities) Reason() string { - if c.Engine == "" { - return "no conversion sandbox found (looked for bwrap, podman, docker on PATH)" - } if c.Err != nil { - if c.Mode == "remote" { - return fmt.Sprintf("podman remote socket unreachable (%s): %s", c.RemoteURL, c.Err.Error()) - } return c.Err.Error() } + var missing []string + if c.PandocBinary == "" { + missing = append(missing, "pandoc") + } + if c.ChromiumBinary == "" { + missing = append(missing, "chromium-browser") + } + if len(missing) > 0 { + return fmt.Sprintf("conversion binary not found on PATH: %s — runtime image is missing the conversion toolchain (see zddc/runtime.Containerfile)", strings.Join(missing, ", ")) + } return "unavailable" } @@ -74,187 +68,75 @@ func Available() (Capabilities, bool) { return *p, p.Ready() } -// SetRemoteURL installs the podman remote socket URL for subsequent -// Probe / Reprobe calls. Empty means "local mode" (the engine binary -// creates containers in the same process). Called from -// cmd/zddc-server/main.go after flag parsing, before Probe. -func SetRemoteURL(url string) { - s := url - remoteURL.Store(&s) -} - -func currentRemoteURL() string { - if p := remoteURL.Load(); p != nil { - return *p - } - return "" -} - -// Probe locates the container engine and installs a containerRunner -// as the package default. Call once at server startup. Returns the -// captured Capabilities for logging. +// Probe resolves the conversion binaries on PATH and installs the +// localRunner. Call once at server startup. Returns the captured +// Capabilities for logging. // -// Engine order: engineOverride (if non-empty) → podman → docker. First -// hit wins. Image presence is NOT probed: the runner uses -// `--pull=missing` so the first conversion request will pull whichever -// image it needs. +// Image responsibility: the binaries on PATH should be the wrapper +// scripts at /usr/local/bin/{pandoc,chromium-browser} (shipped by +// zddc/runtime.Containerfile). Each wrapper handles cgroup setup +// + bwrap sandbox + exec of the real binary at /usr/bin/. +// If an operator runs zddc-server outside the runtime image with +// raw pandoc / chromium on PATH, the conversion still works but +// without the per-call sandbox + resource caps. // -// In remote mode (SetRemoteURL with non-empty URL), the probe also -// invokes ` --remote --url= version` to confirm the -// sidecar's socket is reachable. A reachable-engine-but-unreachable- -// socket state surfaces as Ready=false so conversion requests serve -// 503 until the sidecar comes up. -// -// Any failure here is non-fatal: the server still starts, conversion +// Failure here is non-fatal: the server still starts, conversion // endpoints just return 503. -func Probe(ctx context.Context, engineOverride string) Capabilities { +func Probe(ctx context.Context) Capabilities { probeCool.Lock() defer probeCool.Unlock() - now := time.Now() - rURL := currentRemoteURL() - c := Capabilities{ - PandocImage: currentPandocImage(), - ChromiumImage: currentChromiumImage(), - Mode: "local", - RemoteURL: rURL, - ProbedAt: now, + c := Capabilities{ProbedAt: time.Now()} + + pandocBin := currentPandocBinary() + chromiumBin := currentChromiumBinary() + + if p, err := exec.LookPath(pandocBin); err == nil { + c.PandocBinary = p + if v, err := probeVersion(ctx, p); err == nil { + c.PandocVersion = v + } } - if rURL != "" { - c.Mode = "remote" + if p, err := exec.LookPath(chromiumBin); err == nil { + c.ChromiumBinary = p + if v, err := probeVersion(ctx, p); err == nil { + c.ChromiumVersion = v + } } - enginePath := resolveEngine(engineOverride) - if enginePath == "" { - c.Err = fmt.Errorf("no conversion sandbox found (tried: %s)", strings.Join(enginesTried(engineOverride), ", ")) + if c.PandocBinary == "" || c.ChromiumBinary == "" { + c.Err = fmt.Errorf("%s", c.Reason()) caps.Store(&c) slog.Warn("convert: probe failed", "reason", c.Err.Error()) return c } - kind := engineKind(enginePath) - c.Engine = kind - if v, err := probeVersion(ctx, enginePath); err == nil { - c.EngineVer = v - } - - // bwrap engine: no remote-mode concept, just install the runner. - // The bwrap binary IS the sandbox; conversion binaries (pandoc, - // chromium) are resolved separately from PATH at call time and - // reported by the convert-health endpoint when ready. - if kind == "bwrap" { - InstallRunner(newBwrapRunner(enginePath)) - caps.Store(&c) - slog.Info("convert: ready", - "engine", kind, - "engine_path", enginePath, - "engine_version", c.EngineVer, - "pandoc_binary", currentPandocBinary(), - "chromium_binary", currentChromiumBinary()) - return c - } - - // Legacy OCI engine (podman/docker). Optional remote-socket - // connectivity check, then install containerRunner. - if rURL != "" { - if err := probeRemoteSocket(ctx, enginePath, rURL); err != nil { - c.Err = err - caps.Store(&c) - slog.Warn("convert: remote socket probe failed", - "engine", kind, "remote_url", rURL, "err", err) - return c - } - } - - InstallRunner(newContainerRunner(enginePath, rURL)) + InstallRunner(newLocalRunner()) caps.Store(&c) slog.Info("convert: ready", - "engine", kind, - "engine_path", enginePath, - "engine_version", c.EngineVer, - "mode", c.Mode, - "remote_url", c.RemoteURL, - "pandoc_image", c.PandocImage, - "chromium_image", c.ChromiumImage) + "pandoc_binary", c.PandocBinary, + "pandoc_version", c.PandocVersion, + "chromium_binary", c.ChromiumBinary, + "chromium_version", c.ChromiumVersion) return c } -// probeRemoteSocket runs ` --remote --url= version` with -// a short timeout. Returns nil on success; a wrapped error otherwise. -// The remote URL is typically a Unix socket path -// (unix:///var/run/podman/podman.sock) in the sidecar pattern but a -// TCP form (tcp://host:port) is accepted too. -func probeRemoteSocket(ctx context.Context, engine, url string) error { - c := exec.CommandContext(ctx, engine, "--remote", "--url="+url, "version", "--format={{.Client.Version}}") - out, err := c.CombinedOutput() - if err != nil { - return fmt.Errorf("podman --remote version: %w (output: %s)", err, strings.TrimSpace(string(out))) - } - return nil -} - -// Reprobe re-runs Probe with the existing configuration. Used by the -// handler when a request hits a not-Ready state — gives the operator -// a way to recover (e.g. installed podman after the server started) -// without a server restart. Cooldown of 60 s between probes to keep -// error-path requests cheap. -func Reprobe(ctx context.Context, engineOverride string) Capabilities { +// Reprobe re-runs Probe with the existing configuration. Used by +// the handler when a request hits a not-Ready state — gives the +// operator a way to recover (e.g. installed pandoc after server +// start) without a server restart. Cooldown of 60 s between probes +// to keep error-path requests cheap. +func Reprobe(ctx context.Context) Capabilities { if p := caps.Load(); p != nil { if time.Since(p.ProbedAt) < 60*time.Second { return *p } } - return Probe(ctx, engineOverride) + return Probe(ctx) } -func resolveEngine(override string) string { - if override != "" { - if p, err := exec.LookPath(override); err == nil { - return p - } - return "" - } - // Probe order: bwrap (production default — lightest sandbox, no - // daemon, no OCI engine), then podman / docker as legacy fallbacks - // for hosts that already have a container engine and want OCI-image - // isolation per conversion. - for _, name := range []string{"bwrap", "podman", "docker"} { - if p, err := exec.LookPath(name); err == nil { - return p - } - } - return "" -} - -func enginesTried(override string) []string { - if override != "" { - return []string{override} - } - return []string{"bwrap", "podman", "docker"} -} - -// engineKind returns the engine-family label for a resolved binary -// path. "bwrap" is its own engine; "podman" and "docker" are the -// OCI-container engines handled by containerRunner. Used by Probe to -// pick the right Runner implementation. -func engineKind(resolved string) string { - base := resolved - if i := strings.LastIndex(base, "/"); i >= 0 { - base = base[i+1:] - } - switch base { - case "bwrap": - return "bwrap" - case "podman", "podman-remote": - return "podman" - case "docker": - return "docker" - } - return base -} - -func probeVersion(ctx context.Context, engine string) (string, error) { - c := exec.CommandContext(ctx, engine, "--version") +func probeVersion(ctx context.Context, binary string) (string, error) { + c := exec.CommandContext(ctx, binary, "--version") out, err := c.CombinedOutput() if err != nil { return "", err diff --git a/zddc/internal/convert/runner.go b/zddc/internal/convert/runner.go index 431d306..5dd3d26 100644 --- a/zddc/internal/convert/runner.go +++ b/zddc/internal/convert/runner.go @@ -10,60 +10,45 @@ import ( "os" "os/exec" "path/filepath" - "strings" "sync" "time" ) -// ToolSpec identifies the conversion tool to invoke. Runners pick -// whichever field applies to them: +// Runner executes a conversion binary and returns its stdout. The +// production implementation (localRunner) just exec's the binary +// directly. Tests use a fake. // -// - bwrapRunner uses Binary — the path or PATH-name of the tool on -// the zddc-server host (or container). pandoc/latex's entrypoint -// becomes `pandoc`; alpine-chrome's becomes `chromium-browser`. -// This is the production-default engine: lightest sandbox, no -// daemon, no privileged outer container. +// binary is the PATH-resolvable name (or absolute path) of the +// conversion tool — typically "pandoc" or "chromium-browser". In the +// production runtime image those names resolve to wrapper scripts at +// /usr/local/bin/ that put the real binary into a cgroup + bwrap +// sandbox before exec'ing it. From zddc-server's perspective, that +// indirection is invisible: it just sees pandoc behavior. // -// - containerRunner uses Image — the OCI image ref pulled into a -// fresh container for each conversion (legacy/fallback engine, -// kept for environments that already host a podman/docker daemon -// and want OCI-image isolation per conversion). +// stdin is piped to the binary's stdin. scratchDir is an optional +// host directory the binary needs to read from / write to (template +// + intermediate HTML + PDF output); passed to the child via the +// ZDDC_SCRATCH env var, which the wrapper script bind-mounts into +// the sandbox at the same path. Empty means "no scratch dir +// needed" (DOCX flow — stdin to stdout, no files). // -// Both fields are populated by the entry points in convert.go so a -// single call site works regardless of which engine is installed. -type ToolSpec struct { - Image string // OCI image ref (containerRunner) - Binary string // binary name on PATH (bwrapRunner) -} - -// Runner executes a conversion sub-process and returns its stdout. -// The host-side implementations are bwrapRunner (default; wraps -// `bubblewrap`) and containerRunner (fallback; wraps `podman run` / -// `docker run`). Tests use a fake. +// cmd is the argv passed to the binary. Same shape across all +// runners; no shell quoting; no engine-specific flags. // -// stdin is piped to the tool's stdin. cmd is the argv passed *to the -// tool* — for pandoc the entrypoint accepts pandoc flags directly; -// for chromium it accepts chromium-browser flags. mounts is a list -// of ":" specs (":ro" is added if no mode -// segment is present); each runner translates them to its own -// bind/--volume syntax. -// -// All exec calls in this package go through Runner.Run. This is the -// first os/exec site in the codebase; the hardening here is the -// pattern for future shell-outs. +// All exec calls in this package go through Runner.Run. type Runner interface { - Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) + Run(ctx context.Context, binary string, stdin []byte, scratchDir string, cmd []string) ([]byte, error) } -// ErrUnavailable means no container runtime is present on the host. -// Handlers translate to HTTP 503. +// ErrUnavailable means the conversion binary couldn't be found on +// PATH. Handlers translate to HTTP 503. var ErrUnavailable = errors.New("conversion unavailable") // ConvertError carries the failure surface from a non-zero exit. -// Stderr is captured (truncated to 4 KiB by the runner) so callers can -// surface pandoc/chromium's own complaint. +// Stderr is captured (truncated to 4 KiB by the runner) so callers +// can surface the binary's own complaint. type ConvertError struct { - Tool string // image name fragment, used only for logging + Tool string // binary name, used only for logging ExitCode int Stderr string Cause error @@ -74,78 +59,154 @@ func (e *ConvertError) Error() string { return "" } if e.Stderr != "" { - return fmt.Sprintf("%s exit %d: %s", e.Tool, e.ExitCode, strings.TrimSpace(e.Stderr)) + return fmt.Sprintf("%s exit %d: %s", e.Tool, e.ExitCode, e.Stderr) } return fmt.Sprintf("%s exit %d: %v", e.Tool, e.ExitCode, e.Cause) } func (e *ConvertError) Unwrap() error { return e.Cause } -// containerRunner runs each conversion inside a fresh container. -// The engine ("podman" preferred, "docker" fallback) is resolved once -// at startup by Probe. Resource limits are configurable via -// SetLimits (called from main.go after flag parsing). Images are passed -// per call so the same runner handles both pandoc and chromium -// invocations. +// localRunner exec's the conversion binary directly. The runtime +// image's wrapper script (at /usr/local/bin/) handles +// sandboxing + resource limits BETWEEN this exec and the real +// binary — invisible to this Runner. // -// Two modes: -// -// - **local** (remoteURL=""): the engine binary creates containers -// directly on the host that runs zddc-server. Used for bare-metal -// and host-podman deployments. Requires podman or docker on PATH. -// -// - **remote** (remoteURL="unix:///var/run/podman/podman.sock" or -// similar): the engine binary is the local podman CLIENT, invoked -// as `podman --remote --url= run …`; the actual -// container creation happens in whatever process owns the socket -// (typically a `podman system service` sidecar in the same pod). -// Used for the Kubernetes sidecar pattern so zddc-server's own -// pod stays unprivileged. Bind-mount paths must resolve identically -// on both sides — see scratchDir. -// -// The runner relies on `--pull=missing` so the operator never has to -// pre-pull images: the first request that needs an image pulls it, -// subsequent requests use the local cache. Both podman and docker -// honour this flag identically. -type containerRunner struct { - mu sync.RWMutex - engine string - remoteURL string - memMiB int - cpus string - pids int - timeout time.Duration +// Resource limits stored here are advisory only; the wrapper reads +// them via env (ZDDC_CONV_MEM_MAX, ZDDC_CONV_PIDS_MAX) and applies +// them to its transient cgroup. Wall-clock timeout IS enforced +// here via context.WithTimeout. +type localRunner struct { + mu sync.RWMutex + memMiB int + pids int + timeout time.Duration +} + +func newLocalRunner() *localRunner { + return &localRunner{ + memMiB: 1024, // 1 GiB — matches the wrapper's default + pids: 256, + timeout: 60 * time.Second, + } +} + +// SetLimits updates the resource ceilings advertised to the wrapper +// script via env vars + the wall-clock timeout enforced here. +// Zero values keep the previous setting (or constructor defaults). +// Safe to call from multiple goroutines. +func (lr *localRunner) SetLimits(memMiB int, pids int, timeout time.Duration) { + lr.mu.Lock() + defer lr.mu.Unlock() + if memMiB > 0 { + lr.memMiB = memMiB + } + if pids > 0 { + lr.pids = pids + } + if timeout > 0 { + lr.timeout = timeout + } +} + +func (lr *localRunner) Run(ctx context.Context, binary string, stdin []byte, scratchDir string, cmd []string) ([]byte, error) { + lr.mu.RLock() + memMiB := lr.memMiB + pids := lr.pids + timeout := lr.timeout + lr.mu.RUnlock() + + if binary == "" { + return nil, ErrUnavailable + } + + runCtx, cancel := context.WithTimeout(ctx, timeout) + defer cancel() + + c := exec.CommandContext(runCtx, binary, cmd...) + c.Cancel = func() error { + if c.Process == nil { + return nil + } + return c.Process.Kill() + } + c.WaitDelay = 2 * time.Second + c.SysProcAttr = sysProcAttr() + + // Minimal env passed to the wrapper. The wrapper does + // --clearenv inside the bwrap sandbox so the real binary + // sees only what bwrap re-injects (HOME, PATH, LANG). These + // vars are read by the WRAPPER itself, not the binary, to + // drive its cgroup setup + scratch-dir bind mount. + env := []string{ + "PATH=" + os.Getenv("PATH"), + "HOME=" + os.TempDir(), + fmt.Sprintf("ZDDC_CONV_MEM_MAX=%dM", memMiB), + fmt.Sprintf("ZDDC_CONV_PIDS_MAX=%d", pids), + } + if scratchDir != "" { + env = append(env, "ZDDC_SCRATCH="+scratchDir) + } + c.Env = env + c.Stdin = bytes.NewReader(stdin) + + var stdoutBuf bytes.Buffer + c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20} + stderr := newRingWriter(4 << 10) + c.Stderr = stderr + + if err := c.Run(); err != nil { + exitCode := -1 + if ee, ok := err.(*exec.ExitError); ok { + exitCode = ee.ExitCode() + } + if runCtx.Err() == context.DeadlineExceeded { + return nil, &ConvertError{ + Tool: binary, + ExitCode: exitCode, + Stderr: stderr.String(), + Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()), + } + } + return nil, &ConvertError{ + Tool: binary, + ExitCode: exitCode, + Stderr: stderr.String(), + Cause: err, + } + } + return stdoutBuf.Bytes(), nil } var ( // shared default runner, populated by InstallRunner (called from - // the health probe at startup once the engine is known). + // the health probe at startup once the binaries are confirmed). defaultRunnerMu sync.RWMutex defaultRunner Runner ) -// InstallRunner sets the package-level Runner used by ToDocx/ToHTML/ToPDF. -// Tests inject a fake; production code lets the health probe install a -// containerRunner. Safe to call from multiple goroutines. +// InstallRunner sets the package-level Runner used by ToDocx/ToHTML/ +// ToPDF. Tests inject a fake; production code lets the health probe +// install a localRunner. Safe to call from multiple goroutines. func InstallRunner(r Runner) { defaultRunnerMu.Lock() defaultRunner = r defaultRunnerMu.Unlock() } -// ConfigureLimits applies resource limits to the package-level Runner, -// if it's a containerRunner. No-op when no runner is installed yet -// (the probe failed) or when the installed runner doesn't accept +// ConfigureLimits applies resource limits to the package-level +// Runner, if it's a localRunner. No-op when no runner is installed +// yet (the probe failed) or when the installed runner doesn't accept // limits (e.g. a test fake). Zero values keep the previous setting. // -// Called from cmd/zddc-server/main.go after Probe so the limits from -// the operator's flags take effect before any conversion request lands. -func ConfigureLimits(memMiB int, cpus string, pids int, timeout time.Duration) { +// Called from cmd/zddc-server/main.go after Probe so the limits +// from the operator's flags take effect before any conversion +// request lands. +func ConfigureLimits(memMiB int, pids int, timeout time.Duration) { defaultRunnerMu.RLock() r := defaultRunner defaultRunnerMu.RUnlock() - if cr, ok := r.(*containerRunner); ok { - cr.SetLimits(memMiB, cpus, pids, timeout) + if lr, ok := r.(*localRunner); ok { + lr.SetLimits(memMiB, pids, timeout) } } @@ -156,428 +217,8 @@ func currentRunner() Runner { return r } -// SetLimits updates the resource ceilings used for subsequent Run -// invocations. Zero values keep the previous setting (or the defaults -// set at construction). Safe to call from multiple goroutines. -func (cr *containerRunner) SetLimits(memMiB int, cpus string, pids int, timeout time.Duration) { - cr.mu.Lock() - defer cr.mu.Unlock() - if memMiB > 0 { - cr.memMiB = memMiB - } - if cpus != "" { - cr.cpus = cpus - } - if pids > 0 { - cr.pids = pids - } - if timeout > 0 { - cr.timeout = timeout - } -} - -func newContainerRunner(engine, remoteURL string) *containerRunner { - return &containerRunner{ - engine: engine, - remoteURL: remoteURL, - memMiB: 512, - cpus: "2", - pids: 100, - timeout: 30 * time.Second, - } -} - -// Run executes one container invocation. cmd is the argv passed to the -// image's entrypoint (pandoc for pandoc/latex, chromium-browser for -// alpine-chrome). mounts is a list of ":" -// strings; ":ro" is appended when no mode segment is present. stdin is -// piped to the container, stdout is returned as bytes (capped at -// 128 MiB). -// -// Hardening: -// - --pull=missing: image is fetched on first use, cached after. -// Operator only needs podman/docker installed; no manual pull. -// - --rm: container is removed on exit, even if killed. -// - --network=none: no network inside the container. Prevents data -// exfiltration through embedded URLs in source documents. -// - --read-only + tmpfs on /tmp and /run: image fs is immutable; -// pandoc/chromium scratch goes to tmpfs only. -// - --memory / --cpus / --pids-limit: kernel-enforced caps. -// - --cap-drop=ALL + --security-opt=no-new-privileges: standard -// container-escape hardening. -// - context-cancel kill + WaitDelay: a wedged podman gets force- -// killed; pipes drop after 2s so we don't leak goroutines. -// - cmd.Env minimal: only PATH + HOME are passed through to the -// engine binary; the container itself sees only what the image -// bakes in plus what --env adds (HOME=/tmp). -// -// Note: --user is intentionally NOT set so each image uses its -// default user (pandoc/latex runs as root, alpine-chrome runs as -// uid 1000). With --read-only + tmpfs + --cap-drop=ALL + -// --network=none + --no-new-privileges the additional defense from -// forcing nobody is small and would break alpine-chrome's own -// user-data-dir layout. -func (cr *containerRunner) Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) { - cr.mu.RLock() - engine := cr.engine - remoteURL := cr.remoteURL - memMiB := cr.memMiB - cpus := cr.cpus - pids := cr.pids - timeout := cr.timeout - cr.mu.RUnlock() - - if engine == "" { - return nil, ErrUnavailable - } - image := tool.Image - if image == "" { - return nil, fmt.Errorf("convert.Run: tool.Image is empty (containerRunner requires an OCI image ref)") - } - - runCtx, cancel := context.WithTimeout(ctx, timeout) - defer cancel() - - // Client args. In remote mode, prepend --remote and --url so the - // podman CLI dispatches the request to the sidecar's - // `podman system service` instead of creating a container locally. - // The remaining flags (--rm, --pull=missing, etc.) apply to the - // container that the remote daemon will create — same wire format - // as local mode. - var args []string - if remoteURL != "" { - args = append(args, "--remote", "--url="+remoteURL) - } - args = append(args, - "run", - "--rm", - "--pull=missing", - "-i", - ) - // --userns=host only in local mode: needed when zddc-server itself - // is the one running podman inside a Kubernetes pod, because the - // kernel won't let an inner rootless podman set up its own userns - // via newuidmap. In remote (sidecar) mode the sidecar runs as root - // and creates the inner container in its own (rootful) namespace, - // so --userns=host is unnecessary and potentially noisy. - if remoteURL == "" { - args = append(args, "--userns=host") - } - args = append(args, - "--network=none", - "--read-only", - // /tmp must be large enough to host chromium's shared-memory - // fallback (--disable-dev-shm-usage redirects /dev/shm writes - // here) plus the user-data-dir. 256 MiB is plenty for the - // HTML→PDF flow; pandoc itself uses almost none. - "--tmpfs=/tmp:size=256m,exec", - "--tmpfs=/run:size=4m", - fmt.Sprintf("--memory=%dm", memMiB), - fmt.Sprintf("--cpus=%s", cpus), - fmt.Sprintf("--pids-limit=%d", pids), - "--cap-drop=ALL", - "--security-opt=no-new-privileges", - "--env=HOME=/tmp", - "--workdir=/tmp", - ) - for _, m := range mounts { - if !strings.Contains(m, ":ro") && !strings.Contains(m, ":rw") { - m += ":ro" - } - args = append(args, "--volume="+m) - } - args = append(args, image) - args = append(args, cmd...) - - c := exec.CommandContext(runCtx, engine, args...) - c.Cancel = func() error { - if c.Process == nil { - return nil - } - return c.Process.Kill() - } - c.WaitDelay = 2 * time.Second - c.SysProcAttr = sysProcAttr() - c.Env = []string{ - "PATH=" + os.Getenv("PATH"), - "HOME=" + os.TempDir(), - } - c.Stdin = bytes.NewReader(stdin) - - var stdoutBuf bytes.Buffer - c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20} - stderr := newRingWriter(4 << 10) - c.Stderr = stderr - - err := c.Run() - if err != nil { - exitCode := -1 - if ee, ok := err.(*exec.ExitError); ok { - exitCode = ee.ExitCode() - } - toolName := imageTag(image) - if runCtx.Err() == context.DeadlineExceeded { - return nil, &ConvertError{ - Tool: toolName, - ExitCode: exitCode, - Stderr: stderr.String(), - Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()), - } - } - return nil, &ConvertError{ - Tool: toolName, - ExitCode: exitCode, - Stderr: stderr.String(), - Cause: err, - } - } - - return stdoutBuf.Bytes(), nil -} - -// ─────────────────────────────────────────────────────────────────────────── -// bwrapRunner — default conversion engine. -// -// Wraps `bubblewrap` to run pandoc / chromium binaries directly in a -// per-call Linux-namespace sandbox. No daemon, no OCI images, no -// privileged outer container. Image-build bundles pandoc + chromium -// into the zddc-server image so the binaries are available on PATH; -// each conversion gets a fresh set of namespaces, a read-only view -// of the host's /usr (so the binary + its libs are visible), a tmpfs -// /tmp, and nothing else. -// -// This matches the threat model of the legacy containerRunner — -// untrusted source-markdown drives the binary, we contain any -// resulting RCE inside the bwrap sandbox — without the operational -// tax of running a container engine per conversion (image pull, -// daemon, socket, ~300ms startup). -// -// Hardening (mirror of containerRunner's flags): -// - --unshare-all + --share-net=off via omission → no network -// - --unshare-user-try → user namespace when kernel allows it -// - --die-with-parent → cleanup on zddc-server exit -// - --ro-bind /usr /usr, /lib /lib, /lib64 /lib64, /etc /etc, /bin /bin -// (where present) → tools + libs visible read-only -// - --proc /proc, --dev /dev → minimal pseudo-filesystems -// - --tmpfs /tmp (256 MiB) → scratch space, matches container path -// - --chdir /tmp → workdir -// - --clearenv + minimal HOME/PATH/LANG → no host env leaks -// - --cap-drop ALL (bwrap default, explicit for clarity) -// ─────────────────────────────────────────────────────────────────────────── - -type bwrapRunner struct { - mu sync.RWMutex - bin string // path to bwrap binary - memMiB int // currently advisory; bwrap has no built-in cap - cpus string // currently advisory - pids int // currently advisory - timeout time.Duration // context deadline per Run -} - -func newBwrapRunner(bin string) *bwrapRunner { - return &bwrapRunner{ - bin: bin, - memMiB: 512, - cpus: "2", - pids: 100, - timeout: 30 * time.Second, - } -} - -// SetLimits — same shape as containerRunner.SetLimits. bwrap itself -// doesn't enforce cgroup limits; we capture the values so an operator -// can read them back via /.profile/config or the convert-health probe. -// Wrapping with systemd-run --scope --property MemoryMax=… is the -// follow-up if hard caps are needed; not in this iteration. -func (br *bwrapRunner) SetLimits(memMiB int, cpus string, pids int, timeout time.Duration) { - br.mu.Lock() - defer br.mu.Unlock() - if memMiB > 0 { - br.memMiB = memMiB - } - if cpus != "" { - br.cpus = cpus - } - if pids > 0 { - br.pids = pids - } - if timeout > 0 { - br.timeout = timeout - } -} - -func (br *bwrapRunner) Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) { - br.mu.RLock() - bwrapBin := br.bin - timeout := br.timeout - br.mu.RUnlock() - - if bwrapBin == "" { - return nil, ErrUnavailable - } - if tool.Binary == "" { - return nil, fmt.Errorf("convert.Run: tool.Binary is empty (bwrapRunner requires a host-binary name)") - } - - runCtx, cancel := context.WithTimeout(ctx, timeout) - defer cancel() - - args, err := buildBwrapArgs(tool.Binary, mounts, cmd) - if err != nil { - return nil, err - } - - c := exec.CommandContext(runCtx, bwrapBin, args...) - c.Cancel = func() error { - if c.Process == nil { - return nil - } - return c.Process.Kill() - } - c.WaitDelay = 2 * time.Second - c.SysProcAttr = sysProcAttr() - c.Env = []string{ - "PATH=" + os.Getenv("PATH"), - "HOME=" + os.TempDir(), - } - c.Stdin = bytes.NewReader(stdin) - - var stdoutBuf bytes.Buffer - c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20} - stderr := newRingWriter(4 << 10) - c.Stderr = stderr - - if runErr := c.Run(); runErr != nil { - exitCode := -1 - if ee, ok := runErr.(*exec.ExitError); ok { - exitCode = ee.ExitCode() - } - toolName := tool.Binary - if runCtx.Err() == context.DeadlineExceeded { - return nil, &ConvertError{ - Tool: toolName, - ExitCode: exitCode, - Stderr: stderr.String(), - Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()), - } - } - return nil, &ConvertError{ - Tool: toolName, - ExitCode: exitCode, - Stderr: stderr.String(), - Cause: runErr, - } - } - return stdoutBuf.Bytes(), nil -} - -// buildBwrapArgs assembles the bwrap argv for a single conversion. -// Exposed as a package-internal helper so tests can lock the sandbox -// flag shape without exec'ing bwrap. Returns an error when a mount -// spec is malformed. -func buildBwrapArgs(binary string, mounts, cmd []string) ([]string, error) { - args := []string{ - // Namespace isolation. --unshare-all unshares user (when - // available), ipc, pid, net, uts, cgroup; --unshare-user-try - // downgrades cleanly when the kernel refuses (e.g. some - // container hosts disable user-namespace creation). - "--unshare-all", - "--unshare-user-try", - "--die-with-parent", - // Read-only system view. Each --ro-bind only mounts paths - // that exist on the host; for hosts where /lib is a symlink - // into /usr/lib (modern Linux) the symlink resolution lets - // bwrap mount /usr's contents through. - "--ro-bind", "/usr", "/usr", - "--ro-bind-try", "/lib", "/lib", - "--ro-bind-try", "/lib64", "/lib64", - "--ro-bind-try", "/bin", "/bin", - "--ro-bind-try", "/sbin", "/sbin", - "--ro-bind-try", "/etc", "/etc", - // Pseudo-filesystems. /proc and /dev are required for any - // non-trivial binary; we make them minimal. - "--proc", "/proc", - "--dev", "/dev", - // Scratch. 256 MiB tmpfs at /tmp matches containerRunner. - // chromium spills its shared-memory fallback (--disable-dev- - // shm-usage) here, so the budget actually matters. - "--tmpfs", "/tmp", - "--size", "268435456", // 256 MiB; applies to the most recent --tmpfs - "--chdir", "/tmp", - // Minimal env. HOME=/tmp lets chromium write its - // user-data-dir without permission errors; PATH covers the - // usual install locations for pandoc + chromium across - // alpine / debian / rhel. - "--clearenv", - "--setenv", "HOME", "/tmp", - "--setenv", "PATH", "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", - "--setenv", "LANG", "C.UTF-8", - } - // Caller-supplied bind mounts (template, output, …). Same - // "host:target[:ro|:rw]" syntax as containerRunner; we translate - // to bwrap's --ro-bind / --bind. - for _, m := range mounts { - host, target, mode, ok := splitMount(m) - if !ok { - return nil, fmt.Errorf("convert.Run: invalid mount spec %q (want host:target[:ro|:rw])", m) - } - if mode == "rw" { - args = append(args, "--bind", host, target) - } else { - args = append(args, "--ro-bind", host, target) - } - } - // Finally the binary + its argv. The binary path is PATH-resolved - // inside the sandbox via the constructed PATH above; if the - // operator passed an absolute path it bypasses PATH lookup and is - // invoked verbatim (still subject to the /usr bind mount). - args = append(args, binary) - args = append(args, cmd...) - return args, nil -} - -// splitMount parses "host:target[:ro|:rw]" into its three parts. -// The mode segment is optional; absent means read-only (matches the -// containerRunner default). -func splitMount(m string) (host, target, mode string, ok bool) { - parts := strings.SplitN(m, ":", 3) - if len(parts) < 2 { - return "", "", "", false - } - host = parts[0] - target = parts[1] - mode = "ro" - if len(parts) == 3 { - switch parts[2] { - case "ro", "rw": - mode = parts[2] - default: - return "", "", "", false - } - } - return host, target, mode, true -} - -// imageTag extracts a short name for an image reference, used as the -// "Tool" label on ConvertError. "docker.io/pandoc/latex:latest" → -// "pandoc/latex". -func imageTag(image string) string { - s := image - // Strip registry prefix. - if i := strings.Index(s, "/"); i >= 0 { - if strings.Contains(s[:i], ".") || strings.Contains(s[:i], ":") { - s = s[i+1:] - } - } - // Strip tag suffix. - if i := strings.LastIndex(s, ":"); i >= 0 { - s = s[:i] - } - return s -} - -// limitWriter caps the underlying buffer at max bytes. Writes past the -// cap return io.ErrShortWrite, which surfaces as a Run() error — the +// limitWriter caps the underlying buffer at max bytes. Writes past +// the cap return an error which surfaces as a Run() error — the // caller then maps to 422 (output too large) at the handler edge. type limitWriter struct { w io.Writer @@ -600,9 +241,9 @@ func (l *limitWriter) Write(p []byte) (int, error) { return n, err } -// ringWriter keeps only the tail of what's written — useful for stderr -// capture where the most-recent bytes are the ones with the actual -// error message and earlier output is usually progress noise. +// ringWriter keeps only the tail of what's written — useful for +// stderr capture where the most-recent bytes carry the actual error +// message and earlier output is usually progress noise. type ringWriter struct { mu sync.Mutex buf []byte @@ -636,16 +277,14 @@ func (r *ringWriter) String() string { // writeAssetsToScratch materialises the embedded viewer-template.html // and custom.css into a fresh scratch dir and returns the host path. // Caller is responsible for os.RemoveAll(dir) when done. Used by -// ToHTML which needs the template visible inside the container. +// ToHTML which needs the template visible inside the sandbox. // -// scratchRoot controls where the temp dir lands. Empty means "use -// $TMPDIR" (local mode default). In remote/sidecar mode the caller -// passes the shared mount path (e.g. "/work") so the podman-service -// sidecar sees the bind-mount source at the same path. +// scratchRoot controls where the temp dir lands. Empty means +// "use $TMPDIR". // -// Files are written world-readable so the container's default user -// (root for pandoc/latex, uid 1000 for alpine-chrome) can read them -// through the read-only bind mount regardless of the host's umask. +// Files are written world-readable so the binary's default user can +// read them through the wrapper's bind mount regardless of the +// host's umask. func writeAssetsToScratch(scratchRoot string) (string, error) { dir, err := os.MkdirTemp(scratchRoot, "zddc-convert-") if err != nil { diff --git a/zddc/internal/handler/converthandler.go b/zddc/internal/handler/converthandler.go index b809b7c..1c95c29 100644 --- a/zddc/internal/handler/converthandler.go +++ b/zddc/internal/handler/converthandler.go @@ -97,7 +97,7 @@ func ServeConverted(cfg config.Config, w http.ResponseWriter, r *http.Request, s if !ok { // One re-probe attempt — gives the operator a way to recover // after building the image without restarting the server. - caps = convert.Reprobe(r.Context(), os.Getenv("ZDDC_CONVERT_ENGINE")) + caps = convert.Reprobe(r.Context()) if !caps.Ready() { w.Header().Set("Retry-After", "60") http.Error(w, "Service Unavailable — "+caps.Reason(), http.StatusServiceUnavailable) diff --git a/zddc/internal/handler/tables.html b/zddc/internal/handler/tables.html index e131876..4a0b219 100644 --- a/zddc/internal/handler/tables.html +++ b/zddc/internal/handler/tables.html @@ -1511,7 +1511,7 @@ body.is-elevated::after {
ZDDC Table - v0.0.17-alpha · 2026-05-19 11:59:55 · 73e34be-dirty + v0.0.17-alpha · 2026-05-19 12:37:53 · 847e082-dirty
diff --git a/zddc/runtime.Containerfile b/zddc/runtime.Containerfile index 889852f..157c053 100644 --- a/zddc/runtime.Containerfile +++ b/zddc/runtime.Containerfile @@ -1,18 +1,30 @@ # Runtime image for zddc-server. # -# Bundles the conversion toolchain (pandoc + chromium + bubblewrap) so -# the MD→DOCX/HTML/PDF endpoint works without an external container -# engine. The convert package's bwrap engine (production default) -# sandboxes each pandoc/chromium invocation in a fresh Linux-namespace; -# no daemon, no socket, no privileged outer container, no OCI image -# pull at conversion time. +# Bundles the conversion toolchain (pandoc + chromium + bubblewrap) +# AND two wrapper scripts that shadow the real binaries on PATH. +# When zddc-server exec's "pandoc" or "chromium-browser", it hits +# /usr/local/bin/pandoc (a symlink to runtime/zddc-sandbox-exec), +# which: +# +# 1. creates a transient cgroup v2 with memory + pids caps, +# 2. drops the process into that cgroup, +# 3. wraps the real binary in a bubblewrap sandbox (private +# namespaces, read-only /usr, fresh tmpfs at /tmp, no network), +# 4. exec's /usr/bin/. +# +# zddc-server's Go code is unaware of any of this — its only contract +# is "if I exec pandoc with these args, I get pandoc behavior." The +# isolation strategy lives entirely in the image; an operator who +# wants firejail / systemd-nspawn / podman-run instead just replaces +# the wrapper script and the binary code keeps working. # # Used by helm charts (helm/zddc-server-prod/) as the main-container -# image. The build is independent of zddc-server itself — the binary -# is built by the helm chart's init container from a pinned git ref -# and copied into this runtime image's filesystem at start. Image -# tags should track the upstream package versions (pandoc, chromium) -# more than zddc-server, since the binary is layered in at deploy time. +# image. The binary is built by the chart's init container from a +# pinned git ref and copied into a shared emptyDir; the chart's +# command is /usr/local/libexec/zddc-cgroup-init /zddc/zddc-server, +# so the cgroup v2 hierarchy is delegated before zddc-server starts +# (see runtime/zddc-cgroup-init for the "no internal processes" +# constraint that requires this indirection). # # Build: # podman build -t zddc-server-runtime:latest \ @@ -23,8 +35,7 @@ # codeberg.org/varasys/zddc-server-runtime:vYYYYMMDD # podman push codeberg.org/varasys/zddc-server-runtime:vYYYYMMDD # -# Size: ≈ 1 GB unpacked (chromium dominates). Container engines -# layer + dedupe the chromium libs across replicas on the same node. +# Size: ≈ 1 GB unpacked (chromium dominates). FROM docker.io/library/alpine:3 RUN apk add --no-cache \ @@ -34,8 +45,12 @@ RUN apk add --no-cache \ font-noto \ ca-certificates -# The init container in helm/zddc-server-*/templates/deployment.yaml -# writes the compiled zddc-server binary to /zddc/zddc-server in a -# shared emptyDir volume; the main container's command is -# `/zddc/zddc-server`. No CMD/ENTRYPOINT here because the binary -# path is provided by the chart, not baked into the image. +# Wrapper scripts. zddc-cgroup-init runs at container start to +# prepare cgroup v2 subtree_control delegation; zddc-sandbox-exec +# is invoked per-conversion via the symlinks below. +COPY runtime/zddc-cgroup-init /usr/local/libexec/zddc-cgroup-init +COPY runtime/zddc-sandbox-exec /usr/local/libexec/zddc-sandbox-exec +RUN chmod 0755 /usr/local/libexec/zddc-cgroup-init \ + /usr/local/libexec/zddc-sandbox-exec \ + && ln -s /usr/local/libexec/zddc-sandbox-exec /usr/local/bin/pandoc \ + && ln -s /usr/local/libexec/zddc-sandbox-exec /usr/local/bin/chromium-browser diff --git a/zddc/runtime/zddc-cgroup-init b/zddc/runtime/zddc-cgroup-init new file mode 100755 index 0000000..bc69ffa --- /dev/null +++ b/zddc/runtime/zddc-cgroup-init @@ -0,0 +1,82 @@ +#!/bin/sh +# zddc-cgroup-init — prepare cgroup v2 hierarchy and exec zddc-server. +# +# The per-conversion wrapper (zddc-sandbox-exec) creates a transient +# child cgroup for each pandoc / chromium invocation, sets memory.max +# and pids.max on it, and moves the conversion process in. That only +# works when: +# +# (a) the cgroup v2 hierarchy is mounted at /sys/fs/cgroup, AND +# (b) the controllers we need (memory, pids) are enabled in the +# parent cgroup's subtree_control file, AND +# (c) the parent cgroup has NO processes in it (cgroup v2's +# "no internal processes" constraint: a cgroup can have +# children OR processes, not both). +# +# A bare container with PID 1 in the root cgroup violates (c). This +# init script does the one-time setup BEFORE exec'ing zddc-server: +# +# 1. mkdir /sys/fs/cgroup/zddc/ (a sibling for zddc-server) +# 2. move every PID out of root into /sys/fs/cgroup/zddc/ +# 3. enable +memory +pids in root's subtree_control (now empty) +# 4. enable +memory +pids in zddc/'s subtree_control (so its +# children — the per-conversion cgroups created by the wrapper +# — can use those controllers) +# 5. exec zddc-server (which inherits cgroup membership in zddc/) +# +# After this, the wrapper script creates /sys/fs/cgroup/conv./ +# as a sibling of /sys/fs/cgroup/zddc/, sets limits, and moves the +# pandoc/chromium process in. Each conversion gets a fresh transient +# cgroup that vanishes when the process exits. +# +# Best-effort: if any step fails (cgroup v1, undelegated subtree, +# read-only cgroupfs in some other container shape), this script +# still exec's zddc-server. The convert pipeline degrades to +# "bwrap sandbox + wall-clock timeout"; an operator notices via +# the warning log line below. + +set -eu + +setup_cgroup_v2() { + cgroot=/sys/fs/cgroup + [ -d "$cgroot" ] || return 1 + # Detect cgroup v2 by the presence of cgroup.controllers at root. + [ -r "$cgroot/cgroup.controllers" ] || return 1 + # Need memory + pids in available controllers. + if ! grep -qw memory "$cgroot/cgroup.controllers"; then + echo "zddc-cgroup-init: cgroup.controllers lacks 'memory' — per-conversion memory cap will be unenforced" >&2 + fi + # Create the leaf where zddc-server itself will live. + mkdir -p "$cgroot/zddc" || return 1 + # Move every PID currently in the root cgroup into zddc/. The + # root must be empty before we can enable subtree_control. + if [ -r "$cgroot/cgroup.procs" ]; then + while read -r pid; do + [ -n "$pid" ] || continue + # Best-effort; processes can exit between read and write. + printf "%s\n" "$pid" > "$cgroot/zddc/cgroup.procs" 2>/dev/null || true + done < "$cgroot/cgroup.procs" + fi + # Enable controllers at root → makes them usable in immediate + # children (zddc/ and any sibling per-conversion cgroup). + printf "+memory +pids" > "$cgroot/cgroup.subtree_control" 2>/dev/null || { + echo "zddc-cgroup-init: could not enable +memory +pids in $cgroot/cgroup.subtree_control — caps will not apply" >&2 + return 1 + } + # Enable inside zddc/ too, so any deeper children of zddc-server + # (which there shouldn't be, but defense in depth) inherit. + printf "+memory +pids" > "$cgroot/zddc/cgroup.subtree_control" 2>/dev/null || true + return 0 +} + +if ! setup_cgroup_v2; then + echo "zddc-cgroup-init: cgroup v2 setup unavailable — running without per-conversion caps" >&2 +fi + +# Hand off to zddc-server. The exec'd process lands in +# /sys/fs/cgroup/zddc/ (we moved ourselves there above). When it +# spawns the wrapper, the wrapper creates a transient sibling cgroup +# under /sys/fs/cgroup/, NOT a child of zddc/, so the conversion's +# cgroup is a peer of zddc-server's — keeping zddc-server's own +# resource accounting separate from conversion accounting. +exec "$@" diff --git a/zddc/runtime/zddc-sandbox-exec b/zddc/runtime/zddc-sandbox-exec new file mode 100755 index 0000000..af68a17 --- /dev/null +++ b/zddc/runtime/zddc-sandbox-exec @@ -0,0 +1,118 @@ +#!/bin/sh +# zddc-sandbox-exec — drop-in wrapper for pandoc and chromium-browser. +# +# Invoked via symlinks at /usr/local/bin/pandoc and +# /usr/local/bin/chromium-browser. zddc-server (and any other caller +# that uses the default PATH) exec's by short name, hits this script +# first, and we transparently run the real binary at /usr/bin/ +# inside: +# +# 1. a transient cgroup v2 (memory + pids cap, kernel-enforced) +# 2. a bubblewrap sandbox (private namespaces, ro-bind /usr, fresh +# tmpfs at /tmp, no network) +# +# zddc-server's Go code does not know about either layer — its only +# contract with the image is "if I exec pandoc with these args, I +# get pandoc behavior back." Swap the wrapper for a different +# isolation strategy (firejail, nspawn, podman-run, raw exec) and +# nothing changes in Go. +# +# Caller-tunable env (with defaults): +# +# ZDDC_SCRATCH host directory to bind-mount read-write +# inside the sandbox at the SAME path. Set by +# zddc-server per-conversion; the markdown +# template, intermediate HTML, and chromium +# output PDF all live there. Absent = no extra +# bind mount; /tmp is a fresh tmpfs only. +# ZDDC_CONV_MEM_MAX cgroup memory.max value (default "1G"). +# cgroup v2 syntax — bytes, "1G", or "max". +# ZDDC_CONV_PIDS_MAX cgroup pids.max value (default "256"). +# ZDDC_CONV_TMPFS_SIZE bwrap tmpfs /tmp byte size (default 256 MiB). + +set -eu + +NAME=$(basename "$0") +REAL="/usr/bin/$NAME" + +if [ ! -x "$REAL" ]; then + echo "zddc-sandbox-exec: $NAME — real binary not found at $REAL" >&2 + exit 127 +fi + +# ── 1. cgroup v2 (best-effort) ────────────────────────────────────────── +# +# zddc-cgroup-init enables +memory +pids in /sys/fs/cgroup/cgroup. +# subtree_control at container start (see that script for the cgroup +# v2 "no internal processes" wrinkle that requires the indirection). +# Here we just need to mkdir a transient child, set caps, move +# ourselves in. The real binary inherits cgroup membership at exec. + +CG_ROOT="/sys/fs/cgroup" +CG_CONTROL="$CG_ROOT/cgroup.subtree_control" + +if [ -w "$CG_CONTROL" ] && grep -qw memory "$CG_CONTROL" 2>/dev/null; then + CG="$CG_ROOT/conv.$$" + if mkdir "$CG" 2>/dev/null; then + # rmdir on exit so the cgroupfs doesn't leak. Best-effort: + # the kernel reaps empty cgroups when the last PID leaves + # anyway, but we tidy up for the case where the wrapper + # itself exits before exec'ing the real binary. + trap 'rmdir "$CG" 2>/dev/null || true' EXIT INT TERM + printf "%s\n" "${ZDDC_CONV_MEM_MAX:-1G}" > "$CG/memory.max" 2>/dev/null || true + printf "%s\n" "${ZDDC_CONV_PIDS_MAX:-256}" > "$CG/pids.max" 2>/dev/null || true + printf "%s\n" "$$" > "$CG/cgroup.procs" 2>/dev/null || true + fi +fi + +# ── 2. bwrap sandbox ──────────────────────────────────────────────────── +# +# Mirror the hardening that internal/convert previously assembled in +# Go: unshare every namespace (--unshare-all also covers network), +# bind /usr read-only so the binary + its libs are visible, drop a +# fresh tmpfs at /tmp, clear the environment to a minimal floor. +# +# Building the bwrap argv preserves "$@" (the original pandoc / +# chromium args) by PREPENDING bwrap flags onto the existing +# positional parameters. Each `set -- new-flag "$@"` puts one flag +# at the front; reads back-to-front the final argv is: +# +# bwrap --unshare-all --unshare-user-try ... -- REAL_BINARY ORIG_ARGS +# +# This is the standard POSIX-sh idiom for "build a command line +# without an array type." + +set -- "$REAL" "$@" # REAL ORIG +set -- -- "$@" # -- REAL ORIG + +# Optional scratch dir, prepended just before "-- REAL ORIG" so it +# lands inside the bwrap flag list: +if [ -n "${ZDDC_SCRATCH:-}" ] && [ -d "$ZDDC_SCRATCH" ]; then + set -- --bind "$ZDDC_SCRATCH" "$ZDDC_SCRATCH" "$@" +fi + +# Common bwrap flags (each one prepended; final order is bottom-up). +set -- --setenv LANG C.UTF-8 "$@" +set -- --setenv PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin "$@" +set -- --setenv HOME /tmp "$@" +set -- --clearenv "$@" +set -- --chdir /tmp "$@" +# bwrap's --size sets the size of the NEXT --tmpfs, so in argv order +# --size must come before --tmpfs. Building bottom-up via prepend means +# the LATER statement here lands earlier in argv: write --tmpfs first +# then --size, so the final $@ starts with "... --size N --tmpfs /tmp". +set -- --tmpfs /tmp "$@" +set -- --size "${ZDDC_CONV_TMPFS_SIZE:-268435456}" "$@" +set -- --dev /dev "$@" +set -- --proc /proc "$@" +set -- --ro-bind-try /etc /etc "$@" +set -- --ro-bind-try /sbin /sbin "$@" +set -- --ro-bind-try /bin /bin "$@" +set -- --ro-bind-try /lib64 /lib64 "$@" +set -- --ro-bind-try /lib /lib "$@" +set -- --ro-bind /usr /usr "$@" +set -- --die-with-parent "$@" +set -- --unshare-user-try "$@" +set -- --unshare-all "$@" + +exec bwrap "$@"