refactor(convert): wrapper-in-image owns the sandbox; Go just exec's binaries

The bwrap engine + OCI engine that lived in internal/convert/runner.go
both leak isolation policy into Go code. Replaced with a single image-
side wrapper that drop-in-shadows pandoc and chromium-browser on PATH.
zddc-server's only contract with the image is now "exec.Command(name,
args) gets you that tool's behavior" — sandboxing, resource caps, and
namespace setup live entirely in shell scripts shipped by the image.

Architecture:
- zddc/runtime/zddc-cgroup-init runs at container start. cgroup v2's
  "no internal processes" constraint forbids a cgroup from having both
  children and processes; the init script moves PID 1 into a child,
  enables +memory +pids in subtree_control, then exec's zddc-server.
  Best-effort: degrades cleanly to "no resource caps" if cgroupfs
  isn't writable.
- zddc/runtime/zddc-sandbox-exec is the per-call wrapper, symlinked
  from /usr/local/bin/{pandoc,chromium-browser}. Creates a transient
  cgroup v2 (memory.max + pids.max), then bubblewrap-sandboxes the
  real binary at /usr/bin/<name>: --unshare-all, --ro-bind /usr,
  --proc /proc, --tmpfs /tmp, --clearenv. Caller's scratch dir comes
  in via ZDDC_SCRATCH env and is bind-mounted at the SAME path so
  absolute paths round-trip unchanged.

Go simplifications (~250 lines net deletion):
- Runner interface: Run(ctx, binary, stdin, scratchDir, cmd) — no
  ToolSpec, no mount list, no engine concept. Single localRunner
  implementation; bwrapRunner + containerRunner both deleted.
- health.Probe just looks up pandoc + chromium on PATH; Capabilities
  drops engine kinds.
- Convert.go: ToHTML/ToPDF write to a per-call scratch dir under
  TMPDIR and pass absolute paths; the wrapper bind-mounts the dir.
  No more "/tpl" / "/pdf" mount-point indirection.
- Config drops --convert-pandoc-image, --convert-chromium-image,
  --convert-engine, --convert-podman-socket (OCI engine gone) and
  --convert-cpus (CPU caps don't apply in the new model — wall-clock
  + memory + pids is the cap set). Defaults raised to match the new
  caps the user authorized: mem 512→1024 MiB, pids 100→256,
  timeout 30→60 s.

Image:
- zddc/runtime.Containerfile builds the production runtime image
  (alpine + bubblewrap + pandoc + chromium + font-noto). Two
  COPY statements pull in the wrapper scripts; ln -s symlinks the
  shadow names.
- bitnest dev image mirrors this layout under /var/lib/zddc-dev-build/.

Container privilege required:
- Nested bwrap needs the outer container to permit user + mount
  namespace creation + MS_SLAVE on root. The default seccomp +
  AppArmor profiles block all of these. Quadlet adds:
    --cap-add=ALL
    --security-opt=seccomp=unconfined
    --security-opt=apparmor=unconfined
    --security-opt=unmask=ALL
  Helm chart sets the equivalent via securityContext (capabilities.
  add: SYS_ADMIN, seccompProfile.type: Unconfined, appArmorProfile.
  type: Unconfined). Trade-off documented in AGENTS.md: zddc-server
  RCE now has near-root power within the container, but the bind-
  mount layout still bounds blast radius; bwrap is the real boundary
  between zddc-server and untrusted markdown.

Tests: convert_test.go fully rewritten for the new Runner signature.
Drops TestBwrapArgs_* (functionality moved out of Go) and
TestImageTag (no more image refs). All 15 Go test packages green.

Verified live on bitnest: pandoc --version round-trip exits 0
through the wrapper; MD→DOCX produces a valid Word 2007+ file
end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
ZDDC 2026-05-19 07:47:58 -05:00
parent 847e082e6e
commit cef7188a77
14 changed files with 691 additions and 1118 deletions

View file

@ -345,19 +345,30 @@ The markdown editor lives at `browse/js/preview-markdown.js` and is mounted as t
## Server-side document conversion (`zddc/internal/convert`) ## Server-side document conversion (`zddc/internal/convert`)
zddc-server can convert `.md` → DOCX/HTML/PDF on demand at `GET /<path>/foo.md?convert=docx|html|pdf`. Implementation: zddc-server can convert `.md` → DOCX/HTML/PDF on demand at `GET /<path>/foo.md?convert=docx|html|pdf`.
- **Two engines, probed bwrap → podman → docker.** The first one found on PATH wins; `--convert-engine=` / `ZDDC_CONVERT_ENGINE` forces a choice. **Architecture.** zddc-server's Go code does the bare minimum: it `exec.Command("pandoc", args...)` or `exec.Command("chromium-browser", args...)`. **The sandbox + resource caps live in the IMAGE**, not in Go. In the production runtime image (`zddc/runtime.Containerfile`), `/usr/local/bin/pandoc` and `/usr/local/bin/chromium-browser` are symlinks to `zddc-sandbox-exec` — a shell wrapper that:
- **bwrap (production default).** Wraps `bubblewrap` to run `pandoc` and `chromium-browser` directly in a per-call Linux-namespace sandbox: `--unshare-all --unshare-user-try --die-with-parent --ro-bind /usr /usr ... --proc /proc --dev /dev --tmpfs /tmp --clearenv`. No daemon, no socket, no OCI image pull at conversion time. Binaries are baked into the zddc-server runtime image (`zddc/runtime.Containerfile`) so the operator just runs the image. Configure binary names via `--convert-pandoc-binary` (default `pandoc`) / `--convert-chromium-binary` (default `chromium-browser`; debian/ubuntu installs as `chromium`). 1. Creates a transient cgroup v2 (memory + pids cap from `ZDDC_CONV_MEM_MAX` / `ZDDC_CONV_PIDS_MAX` env), moves itself in.
2. Wraps the real binary at `/usr/bin/<name>` in a bubblewrap sandbox (`--unshare-all --unshare-user-try --die-with-parent --ro-bind /usr /usr ... --proc /proc --dev /dev --tmpfs /tmp --clearenv`).
3. exec's `/usr/bin/<name>` with the original argv.
- **podman / docker (legacy fallback).** Wraps `podman run` / `docker run` with `--rm --pull=missing --network=none --read-only --tmpfs=/tmp:size=256m,exec --memory --cpus --pids-limit --cap-drop=ALL --security-opt=no-new-privileges --env=HOME=/tmp`. Used when the operator wants OCI-image isolation per conversion and already has an engine on PATH. Default images `docker.io/pandoc/latex:latest` (override via `--convert-pandoc-image=` / `ZDDC_CONVERT_PANDOC_IMAGE`) and `docker.io/zenika/alpine-chrome:latest` (override via `--convert-chromium-image=`). Why this shape: swapping isolation strategies (firejail, systemd-nspawn, podman-run, raw exec for dev) is purely an image concern. The Go code never changed. A separate `zddc-cgroup-init` script runs at container start to delegate cgroup v2 `subtree_control` (the "no internal processes" constraint), then exec's zddc-server. Both scripts live in `zddc/runtime/`.
- Resource caps via `--convert-mem-mib` (default 512), `--convert-cpus` (default "2"), `--convert-pids` (default 100), `--convert-timeout` (default 30s). bwrap stores them advisorily (no cgroup enforcement in this iteration); the OCI engine maps them to `--memory` / `--cpus` / `--pids-limit`. **Outer-container privileges.** Nested bwrap needs the outer container to permit user + mount namespace creation. Pod Security Standards defaults block this. The helm chart sets `securityContext: capabilities.add: [SYS_ADMIN]`, `seccompProfile.type: Unconfined`, `appArmorProfile.type: Unconfined`. Trade-off: a zddc-server RCE has near-root power within the container's namespace, but the bind-mount layout (overlay fs, no host /home or /usr visible) still bounds the blast radius. The per-conversion bwrap sandbox is the real isolation boundary between zddc-server and untrusted pandoc/chromium.
- I/O via bind mount + stdin/stdout. Pandoc reads markdown from stdin, writes to stdout. The viewer template is bind-mounted read-only at `/tpl`. Chromium reads HTML from a read-write bind mount at `/pdf` and writes the PDF to the same mount; the host reads it back. Mount-spec syntax (`host:target[:ro|:rw]`) is identical across engines; the runner translates to `--ro-bind` / `--bind` (bwrap) or `--volume` (podman/docker).
**Config knobs** (all in `cmd/zddc-server`):
- `--convert-pandoc-binary` (default `pandoc`) / `--convert-chromium-binary` (default `chromium-browser`; `chromium` on debian)
- `--convert-scratch-dir` (default `$TMPDIR`) — host scratch root; the wrapper bind-mounts the per-call subdir
- `--convert-mem-mib` (default 1024) → wrapper's `memory.max`
- `--convert-pids` (default 256) → wrapper's `pids.max`
- `--convert-timeout` (default 60s) → enforced in Go via `context.WithTimeout`
**Other plumbing.**
- I/O via stdin/stdout + scratch dir. Pandoc reads markdown from stdin, writes to stdout. Templates + intermediate HTML + output PDF live in a per-call subdir under the scratch root; the dir's host path is passed to the child via `ZDDC_SCRATCH` so the wrapper bind-mounts it into the sandbox at the same path (no path translation).
- Output cached at `<dir>/.converted/<base>.<ext>` (hidden by the `.` prefix). mtime synced to source so the fast path is a stat-and-serve with no exec. PUT/DELETE/MOVE on the source `.md` purges the sidecars. - Output cached at `<dir>/.converted/<base>.<ext>` (hidden by the `.` prefix). mtime synced to source so the fast path is a stat-and-serve with no exec. PUT/DELETE/MOVE on the source `.md` purges the sidecars.
- Per-project template variables (client/project/contractor/project_number) come from `.zddc` `convert:` cascade keys. Title/tracking_number/revision/status are derived from the filename via `zddc.ParseFilename`. - Per-project template variables (client/project/contractor/project_number) come from `.zddc` `convert:` cascade keys. Title/tracking_number/revision/status are derived from the filename via `zddc.ParseFilename`.
- If no sandbox engine is found on PATH, the endpoint serves 503 with a Retry-After. The rest of the server keeps working. - If pandoc/chromium aren't on PATH (operator running zddc-server outside the runtime image), the endpoint serves 503 with a Retry-After. The rest of the server keeps working. Operators who run zddc-server with raw pandoc/chromium (no wrapper) get a working but unsandboxed conversion endpoint — useful for dev iteration.
## Form-data system (`form/` + zddc-server form handler) ## Form-data system (`form/` + zddc-server form handler)

View file

@ -403,7 +403,7 @@ Files at the root level are ignored. The grouping folder list and transmittal fo
**Dependencies:** Toast UI Editor v3.2.2 (vendored at `shared/vendor/toastui-editor-all.min.js`, concatenated into `browse/dist/browse.html` at build time). No runtime CDN, no Tailwind. **Dependencies:** Toast UI Editor v3.2.2 (vendored at `shared/vendor/toastui-editor-all.min.js`, concatenated into `browse/dist/browse.html` at build time). No runtime CDN, no Tailwind.
**Server-mode features:** When the file handle is an `HttpFileHandle` (so `node.url` is set and `state.source === 'server'`), three Download buttons appear in the file header — DOCX/HTML/PDF — fetching `?convert=<fmt>` via `window.zddc.source.downloadConverted()`. Clicks auto-save first if the buffer is dirty so converted bytes reflect what's on screen. The server-side engine is in `zddc/internal/convert` — bwrap is the default sandbox (per-call Linux namespaces, no daemon, pandoc/chromium binaries baked into the runtime image), with podman/docker as legacy OCI-image fallbacks for hosts that already have a container engine. **Server-mode features:** When the file handle is an `HttpFileHandle` (so `node.url` is set and `state.source === 'server'`), three Download buttons appear in the file header — DOCX/HTML/PDF — fetching `?convert=<fmt>` via `window.zddc.source.downloadConverted()`. Clicks auto-save first if the buffer is dirty so converted bytes reflect what's on screen. The server-side engine is in `zddc/internal/convert`: zddc-server `exec.Command`s `pandoc` and `chromium-browser` directly, and the runtime image's wrapper at `/usr/local/bin/<name>` (see `zddc/runtime.Containerfile` + `zddc/runtime/zddc-sandbox-exec`) handles the per-call cgroup v2 + bubblewrap sandbox between that exec and the real binary at `/usr/bin/<name>`. Isolation strategy lives entirely in the image; swap the wrapper for firejail / nspawn / podman-run and Go doesn't change.
--- ---

View file

@ -64,7 +64,36 @@ spec:
- name: zddc-server - name: zddc-server
image: {{ printf "%s:%s" .Values.runtimeImage.repository .Values.runtimeImage.tag | quote }} image: {{ printf "%s:%s" .Values.runtimeImage.repository .Values.runtimeImage.tag | quote }}
imagePullPolicy: IfNotPresent imagePullPolicy: IfNotPresent
command: ["/zddc/zddc-server"] # zddc-cgroup-init prepares cgroup v2 subtree_control then
# exec's zddc-server. Required because cgroup v2 forbids
# processes in a cgroup that has child cgroups; the per-
# conversion wrapper (zddc-sandbox-exec) creates child
# cgroups for resource caps, so the init script has to
# move zddc-server itself out of the root cgroup first.
# See zddc/runtime/zddc-cgroup-init in the source repo.
command: ["/usr/local/libexec/zddc-cgroup-init", "/zddc/zddc-server"]
# The conversion sandbox (bwrap, invoked per-call by
# /usr/local/bin/{pandoc,chromium-browser}) needs to create
# user + mount namespaces inside the container. Pod Security
# Standards default policies forbid this; the chart sets the
# minimum securityContext that lets bwrap function. If your
# cluster's admission controller rejects these settings, you
# have two choices: ask the platform team to allow this pod,
# or accept that /.convert serves 503 (the rest of zddc-
# server still works fine without conversion).
securityContext:
capabilities:
add: ["SYS_ADMIN"]
# cap-add SYS_ADMIN alone isn't enough — see the
# zddc/runtime/zddc-sandbox-exec docstring for the full
# set of LSM relaxations required. K8s 1.30+ supports
# specifying seccompProfile + appArmorProfile fields;
# if your cluster is older, you'll need annotations:
# container.apparmor.security.beta.kubernetes.io/zddc-server: unconfined
seccompProfile:
type: Unconfined
appArmorProfile:
type: Unconfined
ports: ports:
- name: http - name: http
containerPort: 8080 containerPort: 8080

View file

@ -87,29 +87,24 @@ func main() {
"addr", cfg.Addr, "addr", cfg.Addr,
"embedded_apps", embeddedVersionsForLog(embedded)) "embedded_apps", embeddedVersionsForLog(embedded))
// Probe the container runtime for the MD→{docx,html,pdf} endpoint. // Probe pandoc + chromium for the MD→{docx,html,pdf} endpoint.
// Non-fatal: if the host has no podman/docker (or the remote // Non-fatal: if either binary isn't on PATH (operator running
// socket is unreachable in sidecar mode), conversion requests // zddc-server outside the runtime image), conversion requests
// return 503 and everything else keeps working. The probe installs // return 503 and everything else keeps working.
// the package-level Runner when an engine is found; the configured
// Sandbox probe order is bwrap → podman → docker. The
// production-default bwrap engine reads the binary names below
// (pandoc + chromium are baked into the zddc-server image);
// the legacy OCI engines read the image refs and pull them
// lazily on first conversion via `--pull=missing`. The probe
// installs whichever runner the engine resolves to.
// //
// SetRemoteURL + SetScratchDir must run BEFORE Probe so the // In the production runtime image, "pandoc" and "chromium-browser"
// OCI-engine path can hit the sidecar socket when one is // on PATH resolve to wrapper scripts at /usr/local/bin/<name>
// configured; bwrap ignores both. // that put the real binary into a cgroup v2 + bwrap sandbox
convert.SetImages(cfg.ConvertPandocImage, cfg.ConvertChromiumImage) // before exec'ing it. zddc-server is unaware — it just sees
// the corresponding tool's behavior. The wrapper reads
// ZDDC_CONV_MEM_MAX, ZDDC_CONV_PIDS_MAX, and ZDDC_SCRATCH from
// the child env to drive cgroup setup + scratch-dir bind mount.
convert.SetBinaries(cfg.ConvertPandocBinary, cfg.ConvertChromiumBinary) convert.SetBinaries(cfg.ConvertPandocBinary, cfg.ConvertChromiumBinary)
convert.SetRemoteURL(cfg.ConvertPodmanSocket)
convert.SetScratchDir(cfg.ConvertScratchDir) convert.SetScratchDir(cfg.ConvertScratchDir)
probeCtx, probeCancel := context.WithTimeout(context.Background(), 5*time.Second) probeCtx, probeCancel := context.WithTimeout(context.Background(), 5*time.Second)
convert.Probe(probeCtx, cfg.ConvertEngine) convert.Probe(probeCtx)
probeCancel() probeCancel()
convert.ConfigureLimits(cfg.ConvertMemMiB, cfg.ConvertCPUs, cfg.ConvertPIDs, cfg.ConvertTimeout) convert.ConfigureLimits(cfg.ConvertMemMiB, cfg.ConvertPIDs, cfg.ConvertTimeout)
// Client mode short-circuit: when cfg.Upstream is set, this binary // Client mode short-circuit: when cfg.Upstream is set, this binary
// runs as a downstream proxy/cache/mirror rather than a master. // runs as a downstream proxy/cache/mirror rather than a master.

View file

@ -48,26 +48,18 @@ type Config struct {
ArchiveRescanInterval time.Duration // --archive-rescan-interval / ZDDC_ARCHIVE_RESCAN_INTERVAL — periodic full re-walk of the archive index. Covers SMB/CIFS where inotify misses cross-client writes. Default 60s; 0 to disable. ArchiveRescanInterval time.Duration // --archive-rescan-interval / ZDDC_ARCHIVE_RESCAN_INTERVAL — periodic full re-walk of the archive index. Covers SMB/CIFS where inotify misses cross-client writes. Default 60s; 0 to disable.
// MD→{docx,html,pdf} conversion endpoint (see internal/convert). // MD→{docx,html,pdf} conversion endpoint (see internal/convert).
// The server shells out to upstream pandoc + chromium container // zddc-server exec's `pandoc` and `chromium-browser` directly.
// images via podman or docker, pulling each on first use via // In the production runtime image those names resolve to wrapper
// production default. The engine probe order is bwrap → podman → // scripts at /usr/local/bin/ that put the real binary into a
// docker; the first one found on PATH wins. bwrap runs the // cgroup v2 + bubblewrap sandbox before exec'ing it — see
// pandoc + chromium binaries baked into the zddc-server image // zddc/runtime.Containerfile + zddc/runtime/zddc-sandbox-exec.
// in a per-call Linux-namespace sandbox (no daemon, no socket, // zddc-server is unaware of sandboxing; the image owns it.
// no OCI image pull). podman/docker are legacy fallbacks for ConvertPandocBinary string // --convert-pandoc-binary / ZDDC_CONVERT_PANDOC_BINARY — pandoc binary name (PATH-resolved) or absolute path. Default "pandoc". Resolves to the wrapper script in the runtime image.
// hosts that already have a container engine and want OCI-image ConvertChromiumBinary string // --convert-chromium-binary / ZDDC_CONVERT_CHROMIUM_BINARY — chromium binary name (PATH-resolved) or absolute path. Default "chromium-browser" (alpine); set to "chromium" on debian.
// isolation per conversion. ConvertScratchDir string // --convert-scratch-dir / ZDDC_CONVERT_SCRATCH_DIR — directory used for per-conversion scratch (template + HTML/PDF intermediates). The wrapper bind-mounts this into the sandbox at the same path. Empty = use $TMPDIR.
ConvertPandocImage string // --convert-pandoc-image / ZDDC_CONVERT_PANDOC_IMAGE — image for MD→DOCX/HTML when the OCI engine is selected. Default docker.io/pandoc/latex:latest. ConvertMemMiB int // --convert-mem-mib / ZDDC_CONVERT_MEM_MIB — per-conversion memory cap in MiB (advisory; passed to the wrapper via ZDDC_CONV_MEM_MAX, applied as cgroup v2 memory.max). Default 1024.
ConvertChromiumImage string // --convert-chromium-image / ZDDC_CONVERT_CHROMIUM_IMAGE — image for HTML→PDF when the OCI engine is selected. Default docker.io/zenika/alpine-chrome:latest. ConvertPIDs int // --convert-pids / ZDDC_CONVERT_PIDS — per-conversion PID cap (passed to the wrapper via ZDDC_CONV_PIDS_MAX, applied as cgroup v2 pids.max). Default 256.
ConvertPandocBinary string // --convert-pandoc-binary / ZDDC_CONVERT_PANDOC_BINARY — pandoc binary name (PATH-resolved) when the bwrap engine is selected. Default "pandoc". ConvertTimeout time.Duration // --convert-timeout / ZDDC_CONVERT_TIMEOUT — per-conversion wall clock (enforced in zddc-server via context.WithTimeout). Default 60s.
ConvertChromiumBinary string // --convert-chromium-binary / ZDDC_CONVERT_CHROMIUM_BINARY — chromium binary name (PATH-resolved) when the bwrap engine is selected. Default "chromium-browser" (alpine); set to "chromium" on debian.
ConvertEngine string // --convert-engine / ZDDC_CONVERT_ENGINE — override sandbox binary (default: probe for bwrap, then podman, then docker).
ConvertPodmanSocket string // --convert-podman-socket / ZDDC_CONVERT_PODMAN_SOCKET — when non-empty, run podman in remote mode against this Unix socket (e.g. unix:///var/run/podman/podman.sock). Used with the Kubernetes sidecar pattern so zddc-server's own pod stays unprivileged.
ConvertScratchDir string // --convert-scratch-dir / ZDDC_CONVERT_SCRATCH_DIR — directory used for per-conversion scratch (template + HTML/PDF intermediates). Must be a path the remote podman can see at the same path. Empty = use $TMPDIR (local-mode default).
ConvertMemMiB int // --convert-mem-mib / ZDDC_CONVERT_MEM_MIB — per-container memory cap in MiB. Default 512.
ConvertCPUs string // --convert-cpus / ZDDC_CONVERT_CPUS — per-container CPU limit. Default "2".
ConvertPIDs int // --convert-pids / ZDDC_CONVERT_PIDS — per-container PID limit. Default 100.
ConvertTimeout time.Duration // --convert-timeout / ZDDC_CONVERT_TIMEOUT — per-conversion wall clock. Default 30s.
} }
// ErrHelpRequested is returned by Load when --help is passed; the caller // ErrHelpRequested is returned by Load when --help is passed; the caller
@ -146,28 +138,18 @@ func Load(args []string) (Config, error) {
"Maximum PUT body size in bytes for the file API. Default 256 MiB. Larger requests are rejected with 413.") "Maximum PUT body size in bytes for the file API. Default 256 MiB. Larger requests are rejected with 413.")
archiveRescanIntervalFlag := fs.Duration("archive-rescan-interval", parseDurationOrDefault(os.Getenv("ZDDC_ARCHIVE_RESCAN_INTERVAL"), 60*time.Second), archiveRescanIntervalFlag := fs.Duration("archive-rescan-interval", parseDurationOrDefault(os.Getenv("ZDDC_ARCHIVE_RESCAN_INTERVAL"), 60*time.Second),
"Periodic full re-walk of the archive index. Required on SMB/CIFS-backed roots where inotify misses cross-client writes. Default 60s; set 0 to disable.") "Periodic full re-walk of the archive index. Required on SMB/CIFS-backed roots where inotify misses cross-client writes. Default 60s; set 0 to disable.")
convertPandocImageFlag := fs.String("convert-pandoc-image", getEnv("ZDDC_CONVERT_PANDOC_IMAGE", "docker.io/pandoc/latex:latest"),
"Pandoc OCI image for MD→DOCX / MD→HTML, used only when the OCI engine (podman/docker) is selected. Pulled on first use via --pull=missing.")
convertChromiumImageFlag := fs.String("convert-chromium-image", getEnv("ZDDC_CONVERT_CHROMIUM_IMAGE", "docker.io/zenika/alpine-chrome:latest"),
"Chromium OCI image for HTML→PDF, used only when the OCI engine is selected. Pulled on first use via --pull=missing.")
convertPandocBinaryFlag := fs.String("convert-pandoc-binary", getEnv("ZDDC_CONVERT_PANDOC_BINARY", "pandoc"), convertPandocBinaryFlag := fs.String("convert-pandoc-binary", getEnv("ZDDC_CONVERT_PANDOC_BINARY", "pandoc"),
"Pandoc binary name (PATH-resolved) when the bwrap engine is selected. Default \"pandoc\".") "Pandoc binary name (PATH-resolved) or absolute path. Default \"pandoc\". In the runtime image this resolves to the wrapper at /usr/local/bin/pandoc which sandboxes the real binary.")
convertChromiumBinaryFlag := fs.String("convert-chromium-binary", getEnv("ZDDC_CONVERT_CHROMIUM_BINARY", "chromium-browser"), convertChromiumBinaryFlag := fs.String("convert-chromium-binary", getEnv("ZDDC_CONVERT_CHROMIUM_BINARY", "chromium-browser"),
"Chromium binary name (PATH-resolved) when the bwrap engine is selected. Default \"chromium-browser\" (alpine); set to \"chromium\" on debian/ubuntu.") "Chromium binary name (PATH-resolved) or absolute path. Default \"chromium-browser\" (alpine); set to \"chromium\" on debian/ubuntu.")
convertEngineFlag := fs.String("convert-engine", os.Getenv("ZDDC_CONVERT_ENGINE"),
"Conversion sandbox override (default: probe for bwrap, then podman, then docker).")
convertPodmanSocketFlag := fs.String("convert-podman-socket", os.Getenv("ZDDC_CONVERT_PODMAN_SOCKET"),
"Run podman in remote mode against this Unix socket URL (e.g. unix:///var/run/podman/podman.sock). When set, the engine binary is invoked as `podman --remote --url=<this> run …`; the actual container creation happens in whatever process owns the socket (typically a podman-system-service sidecar). Empty = local mode.")
convertScratchDirFlag := fs.String("convert-scratch-dir", os.Getenv("ZDDC_CONVERT_SCRATCH_DIR"), convertScratchDirFlag := fs.String("convert-scratch-dir", os.Getenv("ZDDC_CONVERT_SCRATCH_DIR"),
"Scratch directory for per-conversion intermediates (template, HTML, PDF). In remote mode this MUST be a path that the podman-service side can see at the same path — typically a shared emptyDir mounted at the same mountPath in both containers. Empty = use $TMPDIR (local mode).") "Scratch directory for per-conversion intermediates (template, HTML, PDF). The runtime image's wrapper bind-mounts this into the sandbox at the same path. Empty = use $TMPDIR.")
convertMemMiBFlag := fs.Int("convert-mem-mib", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_MEM_MIB"), 512), convertMemMiBFlag := fs.Int("convert-mem-mib", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_MEM_MIB"), 1024),
"Per-conversion container memory limit in MiB. Default 512.") "Per-conversion memory limit in MiB (advisory; passed to the runtime-image wrapper via ZDDC_CONV_MEM_MAX, applied as cgroup v2 memory.max). Default 1024.")
convertCPUsFlag := fs.String("convert-cpus", getEnv("ZDDC_CONVERT_CPUS", "2"), convertPIDsFlag := fs.Int("convert-pids", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_PIDS"), 256),
"Per-conversion container CPU limit (passed to --cpus). Default 2.") "Per-conversion PID limit (passed to the runtime-image wrapper via ZDDC_CONV_PIDS_MAX, applied as cgroup v2 pids.max). Default 256.")
convertPIDsFlag := fs.Int("convert-pids", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_PIDS"), 100), convertTimeoutFlag := fs.Duration("convert-timeout", parseDurationOrDefault(os.Getenv("ZDDC_CONVERT_TIMEOUT"), 60*time.Second),
"Per-conversion container PID limit. Default 100.") "Per-conversion wall-clock timeout (enforced in zddc-server via context.WithTimeout). Default 60s.")
convertTimeoutFlag := fs.Duration("convert-timeout", parseDurationOrDefault(os.Getenv("ZDDC_CONVERT_TIMEOUT"), 30*time.Second),
"Per-conversion wall-clock timeout. Default 30s.")
accessLogFlag := fs.String("access-log", os.Getenv("ZDDC_ACCESS_LOG"), accessLogFlag := fs.String("access-log", os.Getenv("ZDDC_ACCESS_LOG"),
"Tee structured access logs to this file (JSON, size-rotated). "+ "Tee structured access logs to this file (JSON, size-rotated). "+
"Default: <ZDDC_ROOT>/.zddc.d/logs/access-<hostname>.log. "+ "Default: <ZDDC_ROOT>/.zddc.d/logs/access-<hostname>.log. "+
@ -239,15 +221,10 @@ func Load(args []string) (Config, error) {
AppsPubKey: *appsPubKeyFlag, AppsPubKey: *appsPubKeyFlag,
MaxWriteBytes: *maxWriteBytesFlag, MaxWriteBytes: *maxWriteBytesFlag,
ArchiveRescanInterval: *archiveRescanIntervalFlag, ArchiveRescanInterval: *archiveRescanIntervalFlag,
ConvertPandocImage: *convertPandocImageFlag,
ConvertChromiumImage: *convertChromiumImageFlag,
ConvertPandocBinary: *convertPandocBinaryFlag, ConvertPandocBinary: *convertPandocBinaryFlag,
ConvertChromiumBinary: *convertChromiumBinaryFlag, ConvertChromiumBinary: *convertChromiumBinaryFlag,
ConvertEngine: *convertEngineFlag,
ConvertPodmanSocket: *convertPodmanSocketFlag,
ConvertScratchDir: *convertScratchDirFlag, ConvertScratchDir: *convertScratchDirFlag,
ConvertMemMiB: *convertMemMiBFlag, ConvertMemMiB: *convertMemMiBFlag,
ConvertCPUs: *convertCPUsFlag,
ConvertPIDs: *convertPIDsFlag, ConvertPIDs: *convertPIDsFlag,
ConvertTimeout: *convertTimeoutFlag, ConvertTimeout: *convertTimeoutFlag,
} }

View file

@ -1,20 +1,15 @@
// Package convert turns a markdown source byte-buffer into DOCX, HTML, // Package convert turns a markdown source byte-buffer into DOCX, HTML,
// or PDF. Pandoc handles MD↔DOCX and MD→HTML; headless Chromium handles // or PDF by exec'ing pandoc and chromium-browser. Each conversion runs
// HTML→PDF. Each conversion runs inside an isolating sandbox so an // inside a sandbox provided by the IMAGE — typically a wrapper script
// untrusted source-markdown can't reach the host's filesystem or // at /usr/local/bin/<binary> that puts the real binary into a cgroup
// network even if it drives the binary to RCE. // v2 + bubblewrap sandbox before exec'ing it. See
// zddc/runtime.Containerfile for the production setup.
// //
// Engine probe order (call Probe once at startup, first hit wins): // zddc-server's Go code is unaware of sandboxing: it just exec's
// // "pandoc" or "chromium-browser" and gets the corresponding tool's
// 1. bwrap (production default). Runs the pandoc/chromium binaries // behavior back. Operators who want a different isolation strategy
// baked into the zddc-server runtime image directly under // (firejail, systemd-nspawn, podman-run, raw exec for dev) replace
// bubblewrap: namespace-isolated, no network, read-only /usr, a // the wrapper script in their image; the Go binary doesn't change.
// 256 MiB tmpfs /tmp, minimal proc/dev. Configure binary names
// via SetBinaries; defaults are `pandoc` and `chromium-browser`.
// 2. podman / docker (legacy fallback). Runs each conversion inside
// an OCI container pulled lazily via `--pull=missing`. Defaults
// `docker.io/pandoc/latex:latest` + `docker.io/zenika/alpine-
// chrome:latest`; configure via SetImages.
// //
// Public surface: // Public surface:
// //
@ -22,16 +17,13 @@
// ToHTML(ctx, source, meta) → []byte (standalone HTML) // ToHTML(ctx, source, meta) → []byte (standalone HTML)
// ToPDF (ctx, source, meta) → []byte (PDF, via HTML + chromium) // ToPDF (ctx, source, meta) → []byte (PDF, via HTML + chromium)
// //
// Probe(ctx, override) → Capabilities (call once at startup) // Probe(ctx) → Capabilities (call once at startup)
// Available() → (Capabilities, bool) // Available() → (Capabilities, bool)
// SetImages(pandoc, chromium) — install OCI image refs from config // SetBinaries(pandoc, chromium) — install binary names from config
// SetBinaries(pandoc, chromium) — install bwrap binary names from config // SetScratchDir(dir) — install scratch root from config
// //
// All three converters are safe for concurrent use; each call gets a // All three converters are safe for concurrent use; each call gets a
// fresh sandbox. The pandoc binary (or pandoc/latex image's entrypoint) // fresh scratch dir + (image-provided) sandbox.
// reads pandoc flags directly; the chromium binary (or alpine-chrome
// image's entrypoint) reads chromium-browser flags. No `sh -c`
// wrappers, no shell quoting.
// //
// Metadata maps to the placeholders consumed by viewer-template.html. // Metadata maps to the placeholders consumed by viewer-template.html.
// title/tracking_number/revision/status/is_draft typically come from // title/tracking_number/revision/status/is_draft typically come from
@ -66,55 +58,33 @@ type Metadata struct {
NoTOC bool NoTOC bool
} }
// Default tool refs. The bwrap engine (default since v0.0.x) reads the // Default binary names. The runtime image installs WRAPPER scripts at
// Binary fields below; the legacy containerRunner reads the Image // /usr/local/bin/pandoc and /usr/local/bin/chromium-browser (shadowing
// fields. The convert entry points populate both into a ToolSpec so // the real binaries in /usr/bin/) so these names resolve through the
// whichever engine is installed picks the field it needs. // sandbox automatically. Operators running zddc-server outside the
// runtime image with raw binaries on PATH still get a working
// conversion endpoint — just without the per-call sandbox.
// //
// pandoc/latex carries TeX Live for native PDF too, so the image is a // Alpine's chromium package installs the binary as "chromium-browser";
// superset of pandoc/core. The bwrap engine doesn't pay that cost — // debian/ubuntu ships "chromium". Operators override via
// each binary is installed from the host's package manager (alpine: // --convert-chromium-binary when the package on their image differs.
// pandoc-cli + chromium) and the image grows by ≈ 200 MB once.
const ( const (
DefaultPandocImage = "docker.io/pandoc/latex:latest"
DefaultChromiumImage = "docker.io/zenika/alpine-chrome:latest"
DefaultPandocBinary = "pandoc" DefaultPandocBinary = "pandoc"
// Alpine's chromium package installs the binary as "chromium-browser".
// Debian/Ubuntu ships "chromium". Operators override via
// --convert-chromium-binary when the package on their image differs.
DefaultChromiumBinary = "chromium-browser" DefaultChromiumBinary = "chromium-browser"
) )
var ( var (
pandocImage atomic.Pointer[string]
chromiumImage atomic.Pointer[string]
pandocBinary atomic.Pointer[string] pandocBinary atomic.Pointer[string]
chromiumBinary atomic.Pointer[string] chromiumBinary atomic.Pointer[string]
scratchDir atomic.Pointer[string] scratchDir atomic.Pointer[string]
) )
// SetImages installs the OCI image refs used by the legacy // SetBinaries installs the binary names used by Probe/Run. Empty
// containerRunner engine. The bwrap engine ignores these and reads // values keep the previous setting (or the DefaultPandocBinary /
// the binary names installed via SetBinaries instead. Empty values
// keep the previous setting (or the DefaultPandocImage /
// DefaultChromiumImage constants on first call). Called from
// cmd/zddc-server/main.go after flag parsing.
func SetImages(pandoc, chromium string) {
if pandoc != "" {
s := pandoc
pandocImage.Store(&s)
}
if chromium != "" {
s := chromium
chromiumImage.Store(&s)
}
}
// SetBinaries installs the host-binary names used by the bwrap engine.
// Empty values keep the previous setting (or the DefaultPandocBinary /
// DefaultChromiumBinary constants on first call). The values are // DefaultChromiumBinary constants on first call). The values are
// PATH-resolved names (e.g. "pandoc", "chromium-browser") or absolute // PATH-resolved names (e.g. "pandoc", "chromium-browser") or
// paths. Called from cmd/zddc-server/main.go after flag parsing. // absolute paths. Called from cmd/zddc-server/main.go after flag
// parsing.
func SetBinaries(pandoc, chromium string) { func SetBinaries(pandoc, chromium string) {
if pandoc != "" { if pandoc != "" {
s := pandoc s := pandoc
@ -126,12 +96,11 @@ func SetBinaries(pandoc, chromium string) {
} }
} }
// SetScratchDir installs the host-side scratch root used for per-call // SetScratchDir installs the host-side scratch root used for
// intermediates (template, HTML, PDF). Empty means "use $TMPDIR" — the // per-call intermediates (template, HTML, PDF). Empty means "use
// local-mode default. In remote mode this MUST be a path the podman- // $TMPDIR". The runtime-image wrapper bind-mounts the per-call
// service sidecar can see at the same mountpoint, typically a shared // scratch dir into its sandbox at the same path, so any path under
// emptyDir mounted at /work in both containers. Called from // this root works.
// cmd/zddc-server/main.go after flag parsing.
func SetScratchDir(dir string) { func SetScratchDir(dir string) {
s := dir s := dir
scratchDir.Store(&s) scratchDir.Store(&s)
@ -144,20 +113,6 @@ func currentScratchDir() string {
return "" return ""
} }
func currentPandocImage() string {
if p := pandocImage.Load(); p != nil && *p != "" {
return *p
}
return DefaultPandocImage
}
func currentChromiumImage() string {
if p := chromiumImage.Load(); p != nil && *p != "" {
return *p
}
return DefaultChromiumImage
}
func currentPandocBinary() string { func currentPandocBinary() string {
if p := pandocBinary.Load(); p != nil && *p != "" { if p := pandocBinary.Load(); p != nil && *p != "" {
return *p return *p
@ -172,20 +127,10 @@ func currentChromiumBinary() string {
return DefaultChromiumBinary return DefaultChromiumBinary
} }
// pandocTool / chromiumTool build the ToolSpec passed to Runner.Run. // ToDocx renders source markdown to DOCX bytes. Single pandoc exec;
// Both fields are populated so whichever engine is installed picks // no scratch dir needed (stdin → stdout). The caller passes the
// the one it needs (bwrap reads Binary; containerRunner reads Image). // full file content (envelope + body); pandoc handles
func pandocTool() ToolSpec { // `markdown+yaml_metadata_block` natively.
return ToolSpec{Image: currentPandocImage(), Binary: currentPandocBinary()}
}
func chromiumTool() ToolSpec {
return ToolSpec{Image: currentChromiumImage(), Binary: currentChromiumBinary()}
}
// ToDocx renders source markdown to DOCX bytes. One container run via
// the pandoc image. Caller passes the full file content (envelope +
// body); pandoc handles `markdown+yaml_metadata_block` natively.
func ToDocx(ctx context.Context, source []byte, m Metadata) ([]byte, error) { func ToDocx(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
r := currentRunner() r := currentRunner()
if r == nil { if r == nil {
@ -198,13 +143,14 @@ func ToDocx(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
} }
cmd = append(cmd, metadataArgs(m)...) cmd = append(cmd, metadataArgs(m)...)
cmd = append(cmd, "-") cmd = append(cmd, "-")
return r.Run(ctx, pandocTool(), source, nil, cmd) return r.Run(ctx, currentPandocBinary(), source, "", cmd)
} }
// ToHTML renders source markdown to standalone HTML using // ToHTML renders source markdown to standalone HTML using
// viewer-template.html. Embeds CSS + images via --embed-resources. // viewer-template.html. Embeds CSS + images via --embed-resources.
// Template + custom.css are bind-mounted into the container at /tpl // Template + custom.css live in a per-call scratch dir; the host
// from a per-call scratch dir. // path is passed via ZDDC_SCRATCH so the wrapper bind-mounts it
// into the sandbox at the same path.
func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) { func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
r := currentRunner() r := currentRunner()
if r == nil { if r == nil {
@ -216,6 +162,7 @@ func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
} }
defer os.RemoveAll(scratch) defer os.RemoveAll(scratch)
tplPath := filepath.Join(scratch, "viewer-template.html")
cmd := []string{ cmd := []string{
"--from=markdown+yaml_metadata_block", "--from=markdown+yaml_metadata_block",
"--to=html5", "--to=html5",
@ -224,29 +171,27 @@ func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
"--section-divs", "--section-divs",
"--id-prefix=", "--id-prefix=",
"--html-q-tags", "--html-q-tags",
"--template=/tpl/viewer-template.html", "--template=" + tplPath,
} }
if !m.NoTOC { if !m.NoTOC {
cmd = append(cmd, "--toc", "--toc-depth=6") cmd = append(cmd, "--toc", "--toc-depth=6")
} }
cmd = append(cmd, metadataArgs(m)...) cmd = append(cmd, metadataArgs(m)...)
cmd = append(cmd, "--output=-", "-") cmd = append(cmd, "--output=-", "-")
return r.Run(ctx, currentPandocBinary(), source, scratch, cmd)
mounts := []string{scratch + ":/tpl:ro"}
return r.Run(ctx, pandocTool(), source, mounts, cmd)
} }
// ToPDF renders source markdown to PDF in two stages: pandoc produces // ToPDF renders source markdown to PDF in two stages: pandoc
// HTML using viewer-template.html (stage 1, pandoc image), then headless // produces HTML using viewer-template.html (stage 1), then headless
// Chromium prints that HTML to PDF (stage 2, chromium image). The // chromium prints that HTML to PDF (stage 2). The two-stage choice
// two-stage choice preserves the print-media CSS already authored in // preserves the print-media CSS already authored in viewer-
// viewer-template.html — pandoc's native --pdf-engine path uses LaTeX // template.html — pandoc's native --pdf-engine path uses LaTeX
// which would bypass it entirely. // which would bypass it entirely.
// //
// Chromium runs from the alpine-chrome image whose entrypoint is // Both stages share a single per-call scratch dir: pandoc writes
// `chromium-browser`; our cmd is the flag list passed straight to that // `in.html` and chromium reads it, then chromium writes `out.pdf`
// binary. The host scratch dir is bind-mounted read-write at /pdf so // which the host reads back. The wrapper bind-mounts the scratch
// chromium can write out.pdf and we read it back afterward. // dir read-write into the sandbox at the same path.
func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) { func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
html, err := ToHTML(ctx, source, m) html, err := ToHTML(ctx, source, m)
if err != nil { if err != nil {
@ -271,17 +216,11 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
return nil, err return nil, err
} }
mounts := []string{scratch + ":/pdf:rw"} // --no-sandbox: the wrapper provides the sandbox; chromium's
// alpine-chrome's entrypoint is `chromium-browser`. --no-sandbox is // own setuid sandbox would conflict (and fails inside our
// required because the container drops CAP_SYS_ADMIN; the threat // user-namespace anyway). --disable-dev-shm-usage: chromium's
// model is "malicious markdown drives chromium RCE", contained by // shared-memory fallback writes to /dev/shm which our sandbox
// --network=none + --cap-drop=ALL + --read-only + tmpfs. // doesn't expose; redirect to /tmp (the wrapper's tmpfs).
//
// --disable-dev-shm-usage: without this, chromium tries to allocate
// shared memory under /dev/shm, which our --read-only container
// can't write to. The flag tells chromium to fall back to /tmp,
// which is a writable tmpfs (sized in runner.go). Standard fix for
// chromium-in-container; required by every CI/headless setup.
cmd := []string{ cmd := []string{
"--headless", "--headless",
"--disable-gpu", "--disable-gpu",
@ -290,10 +229,10 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
"--user-data-dir=/tmp/chrome", "--user-data-dir=/tmp/chrome",
"--no-pdf-header-footer", "--no-pdf-header-footer",
"--virtual-time-budget=10000", "--virtual-time-budget=10000",
"--print-to-pdf=/pdf/out.pdf", "--print-to-pdf=" + pdfPath,
"file:///pdf/in.html", "file://" + htmlPath,
} }
if _, err := r.Run(ctx, chromiumTool(), nil, mounts, cmd); err != nil { if _, err := r.Run(ctx, currentChromiumBinary(), nil, scratch, cmd); err != nil {
return nil, err return nil, err
} }
@ -303,7 +242,7 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
} }
if len(out) < 4 || string(out[:4]) != "%PDF" { if len(out) < 4 || string(out[:4]) != "%PDF" {
return nil, &ConvertError{ return nil, &ConvertError{
Tool: "chromium", Tool: currentChromiumBinary(),
ExitCode: 0, ExitCode: 0,
Stderr: "chromium did not produce a valid PDF", Stderr: "chromium did not produce a valid PDF",
Cause: fmt.Errorf("invalid PDF magic in output (got %d bytes)", len(out)), Cause: fmt.Errorf("invalid PDF magic in output (got %d bytes)", len(out)),
@ -312,9 +251,9 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
return out, nil return out, nil
} }
// metadataArgs renders Metadata into pandoc -V flags. Order is stable // metadataArgs renders Metadata into pandoc -V flags. Order is
// so test fixtures don't churn. Empty values are omitted (the template // stable so test fixtures don't churn. Empty values are omitted
// uses $if(...)$ blocks). // (the template uses $if(...)$ blocks).
func metadataArgs(m Metadata) []string { func metadataArgs(m Metadata) []string {
var out []string var out []string
add := func(k, v string) { add := func(k, v string) {

View file

@ -10,25 +10,25 @@ import (
) )
// fakeRunner records the args it was invoked with and replays canned // fakeRunner records the args it was invoked with and replays canned
// responses. Lets us assert the command lines + image refs without // responses. Lets us assert command lines + binary refs + scratch
// needing podman. // dirs without needing actual pandoc.
type fakeRunner struct { type fakeRunner struct {
mu sync.Mutex mu sync.Mutex
calls [][]string calls [][]string
tools []ToolSpec binaries []string
stdin [][]byte stdin [][]byte
mounts [][]string scratchDir []string
resp []byte resp []byte
err error err error
} }
func (f *fakeRunner) Run(_ context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) { func (f *fakeRunner) Run(_ context.Context, binary string, stdin []byte, scratchDir string, cmd []string) ([]byte, error) {
f.mu.Lock() f.mu.Lock()
defer f.mu.Unlock() defer f.mu.Unlock()
f.calls = append(f.calls, append([]string(nil), cmd...)) f.calls = append(f.calls, append([]string(nil), cmd...))
f.tools = append(f.tools, tool) f.binaries = append(f.binaries, binary)
f.stdin = append(f.stdin, append([]byte(nil), stdin...)) f.stdin = append(f.stdin, append([]byte(nil), stdin...))
f.mounts = append(f.mounts, append([]string(nil), mounts...)) f.scratchDir = append(f.scratchDir, scratchDir)
return f.resp, f.err return f.resp, f.err
} }
@ -38,14 +38,14 @@ func (f *fakeRunner) lastCall() (string, []string) {
if len(f.calls) == 0 { if len(f.calls) == 0 {
return "", nil return "", nil
} }
return f.tools[len(f.tools)-1].Image, f.calls[len(f.calls)-1] return f.binaries[len(f.binaries)-1], f.calls[len(f.calls)-1]
} }
func TestToDocx_UsesPandocImage(t *testing.T) { func TestToDocx_UsesPandocBinary(t *testing.T) {
f := &fakeRunner{resp: []byte("FAKE-DOCX")} f := &fakeRunner{resp: []byte("FAKE-DOCX")}
InstallRunner(f) InstallRunner(f)
t.Cleanup(func() { InstallRunner(nil) }) t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:latest", "") SetBinaries("pandoc", "chromium-browser")
out, err := ToDocx(context.Background(), []byte("# Hello\n"), Metadata{ out, err := ToDocx(context.Background(), []byte("# Hello\n"), Metadata{
Title: "Hello", Title: "Hello",
@ -57,9 +57,9 @@ func TestToDocx_UsesPandocImage(t *testing.T) {
if string(out) != "FAKE-DOCX" { if string(out) != "FAKE-DOCX" {
t.Errorf("unexpected output: %q", out) t.Errorf("unexpected output: %q", out)
} }
image, call := f.lastCall() binary, call := f.lastCall()
if image != "docker.io/pandoc/latex:latest" { if binary != "pandoc" {
t.Errorf("expected pandoc image, got %q", image) t.Errorf("expected pandoc binary, got %q", binary)
} }
if !contains(call, "--to=docx") { if !contains(call, "--to=docx") {
t.Errorf("missing --to=docx: %v", call) t.Errorf("missing --to=docx: %v", call)
@ -74,35 +74,40 @@ func TestToDocx_UsesPandocImage(t *testing.T) {
if call[len(call)-1] != "-" { if call[len(call)-1] != "-" {
t.Errorf("expected stdin marker as last arg, got %q", call[len(call)-1]) t.Errorf("expected stdin marker as last arg, got %q", call[len(call)-1])
} }
// ToDocx is stdin → stdout — no scratch dir needed.
if f.scratchDir[len(f.scratchDir)-1] != "" {
t.Errorf("ToDocx should not need a scratch dir, got %q", f.scratchDir[len(f.scratchDir)-1])
}
} }
func TestToHTML_UsesTemplateAndMountsScratch(t *testing.T) { func TestToHTML_UsesTemplateFromScratchDir(t *testing.T) {
f := &fakeRunner{resp: []byte("<html>fake</html>")} f := &fakeRunner{resp: []byte("<html>fake</html>")}
InstallRunner(f) InstallRunner(f)
t.Cleanup(func() { InstallRunner(nil) }) t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:latest", "") SetBinaries("pandoc", "chromium-browser")
_, err := ToHTML(context.Background(), []byte("# Hi\n"), Metadata{Title: "Hi"}) _, err := ToHTML(context.Background(), []byte("# Hi\n"), Metadata{Title: "Hi"})
if err != nil { if err != nil {
t.Fatalf("ToHTML: %v", err) t.Fatalf("ToHTML: %v", err)
} }
image, call := f.lastCall() binary, call := f.lastCall()
if image != "docker.io/pandoc/latex:latest" { if binary != "pandoc" {
t.Errorf("expected pandoc image, got %q", image) t.Errorf("expected pandoc binary, got %q", binary)
} }
if !contains(call, "--template=/tpl/viewer-template.html") { // Template flag must reference an absolute path under the scratch
t.Errorf("template flag missing: %v", call) // dir (no /tpl indirection anymore — the wrapper bind-mounts the
// scratch dir at its own path, so absolute host paths just work).
scratch := f.scratchDir[len(f.scratchDir)-1]
if scratch == "" {
t.Fatalf("ToHTML must pass a scratch dir to the runner")
}
wantTpl := "--template=" + scratch + "/viewer-template.html"
if !contains(call, wantTpl) {
t.Errorf("template flag missing/wrong; want %q in %v", wantTpl, call)
} }
if !contains(call, "--toc") { if !contains(call, "--toc") {
t.Errorf("TOC flag missing (default NoTOC=false): %v", call) t.Errorf("TOC flag missing (default NoTOC=false): %v", call)
} }
if len(f.mounts) == 0 || len(f.mounts[0]) == 0 {
t.Fatalf("expected at least one bind mount for /tpl")
}
mount := f.mounts[0][0]
if !strings.Contains(mount, ":/tpl:") {
t.Errorf("mount missing /tpl: %q", mount)
}
} }
func TestToHTML_NoTOCSuppressesTOC(t *testing.T) { func TestToHTML_NoTOCSuppressesTOC(t *testing.T) {
@ -120,9 +125,9 @@ func TestToHTML_NoTOCSuppressesTOC(t *testing.T) {
} }
} }
// recordingRunner records every call and returns canned responses // recordingRunner records every call and returns canned responses in
// in sequence. Lets ToPDF tests assert the two-stage pipeline // sequence. Lets ToPDF tests assert the two-stage pipeline (pandoc
// (pandoc image then chromium image). // then chromium).
type recordingRunner struct { type recordingRunner struct {
mu sync.Mutex mu sync.Mutex
calls []recordedCall calls []recordedCall
@ -132,18 +137,18 @@ type recordingRunner struct {
} }
type recordedCall struct { type recordedCall struct {
image string binary string
cmd []string cmd []string
mounts []string scratch string
} }
func (r *recordingRunner) Run(_ context.Context, tool ToolSpec, _ []byte, mounts []string, cmd []string) ([]byte, error) { func (r *recordingRunner) Run(_ context.Context, binary string, _ []byte, scratch string, cmd []string) ([]byte, error) {
r.mu.Lock() r.mu.Lock()
defer r.mu.Unlock() defer r.mu.Unlock()
r.calls = append(r.calls, recordedCall{ r.calls = append(r.calls, recordedCall{
image: tool.Image, binary: binary,
cmd: append([]string(nil), cmd...), cmd: append([]string(nil), cmd...),
mounts: append([]string(nil), mounts...), scratch: scratch,
}) })
if r.cursor >= len(r.resp) { if r.cursor >= len(r.resp) {
return nil, nil return nil, nil
@ -169,57 +174,63 @@ func TestScratchDir_UsedByToHTML(t *testing.T) {
if err != nil { if err != nil {
t.Fatalf("ToHTML: %v", err) t.Fatalf("ToHTML: %v", err)
} }
if len(f.mounts) == 0 || len(f.mounts[0]) == 0 { if len(f.scratchDir) == 0 {
t.Fatalf("expected at least one mount") t.Fatalf("expected a scratch dir to be passed to the runner")
} }
mount := f.mounts[0][0] // "<host>:/tpl:ro" got := f.scratchDir[0]
if !strings.HasPrefix(mount, scratchRoot+"/") { if !strings.HasPrefix(got, scratchRoot+"/") {
t.Errorf("scratch dir not under configured root: %q (root=%q)", mount, scratchRoot) t.Errorf("scratch dir not under configured root: %q (root=%q)", got, scratchRoot)
} }
} }
func TestToPDF_TwoStagePipeline(t *testing.T) { func TestToPDF_TwoStagePipeline(t *testing.T) {
// Stage 1: pandoc emits HTML. Stage 2: chromium reads HTML from // Stage 1: pandoc emits HTML. Stage 2: chromium reads HTML from
// the bind mount and writes /pdf/out.pdf. The fake runner can't // the scratch dir and writes out.pdf there. The fake runner can't
// actually write the PDF, so we expect ToPDF to fail at the // actually write the PDF, so we expect ToPDF to fail at the
// read-back step — but we can still assert the two-stage call // read-back step — but we can still assert the two-stage call
// shape and the right image per stage. // shape and the right binary per stage.
r := &recordingRunner{ r := &recordingRunner{
resp: [][]byte{ resp: [][]byte{
[]byte("<html><body>fake</body></html>"), // stage 1 stdout []byte("<html><body>fake</body></html>"), // stage 1 stdout
nil, // stage 2 stdout (chromium writes PDF to bind mount) nil, // stage 2 stdout (chromium writes PDF to scratch)
}, },
} }
InstallRunner(r) InstallRunner(r)
t.Cleanup(func() { InstallRunner(nil) }) t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:latest", "docker.io/zenika/alpine-chrome:latest") SetBinaries("pandoc", "chromium-browser")
_, err := ToPDF(context.Background(), []byte("# Hi\n"), Metadata{}) _, err := ToPDF(context.Background(), []byte("# Hi\n"), Metadata{})
// PDF read-back will fail (fake runner didn't write the file) — // PDF read-back will fail (fake runner didn't write the file) —
// that's expected for this test which only inspects the call // that's expected for this test which only inspects the call shape.
// shape.
if err == nil { if err == nil {
t.Fatalf("expected error from PDF read-back; got nil") t.Fatalf("expected error from PDF read-back; got nil")
} }
if len(r.calls) != 2 { if len(r.calls) != 2 {
t.Fatalf("expected 2 container calls (pandoc + chromium); got %d", len(r.calls)) t.Fatalf("expected 2 calls (pandoc + chromium); got %d", len(r.calls))
} }
if r.calls[0].image != "docker.io/pandoc/latex:latest" { if r.calls[0].binary != "pandoc" {
t.Errorf("stage 1 image: got %q want pandoc/latex", r.calls[0].image) t.Errorf("stage 1 binary: got %q want pandoc", r.calls[0].binary)
} }
if r.calls[1].image != "docker.io/zenika/alpine-chrome:latest" { if r.calls[1].binary != "chromium-browser" {
t.Errorf("stage 2 image: got %q want alpine-chrome", r.calls[1].image) t.Errorf("stage 2 binary: got %q want chromium-browser", r.calls[1].binary)
} }
// Stage 2 must include the --print-to-pdf flag pointing at /pdf. // Stage 2 must include --print-to-pdf pointing at an absolute
if !contains(r.calls[1].cmd, "--print-to-pdf=/pdf/out.pdf") { // path under the scratch dir.
t.Errorf("chromium call missing --print-to-pdf flag: %v", r.calls[1].cmd) stage2 := r.calls[1]
if stage2.scratch == "" {
t.Fatalf("chromium call must have a scratch dir")
} }
if !contains(r.calls[1].cmd, "--no-sandbox") { wantPDF := "--print-to-pdf=" + stage2.scratch + "/out.pdf"
t.Errorf("chromium call missing --no-sandbox: %v", r.calls[1].cmd) if !contains(stage2.cmd, wantPDF) {
t.Errorf("chromium call missing --print-to-pdf=%s/out.pdf: %v", stage2.scratch, stage2.cmd)
} }
// Stage 2's bind mount must be writable (chromium writes the PDF). if !contains(stage2.cmd, "--no-sandbox") {
if len(r.calls[1].mounts) == 0 || !strings.Contains(r.calls[1].mounts[0], ":rw") { t.Errorf("chromium call missing --no-sandbox: %v", stage2.cmd)
t.Errorf("chromium mount must be :rw, got %v", r.calls[1].mounts) }
// Stage 2 chromium reads file://<scratch>/in.html.
wantHTML := "file://" + stage2.scratch + "/in.html"
if !contains(stage2.cmd, wantHTML) {
t.Errorf("chromium call missing file:// URL: %v", stage2.cmd)
} }
} }
@ -255,21 +266,6 @@ func TestMetadataArgs_OmitsEmptyAndOrdersStably(t *testing.T) {
} }
} }
func TestImageTag(t *testing.T) {
cases := map[string]string{
"docker.io/pandoc/latex:latest": "pandoc/latex",
"docker.io/zenika/alpine-chrome:latest": "zenika/alpine-chrome",
"pandoc/core": "pandoc/core",
"quay.io/example/foo:v1": "example/foo",
"alpine": "alpine",
}
for in, want := range cases {
if got := imageTag(in); got != want {
t.Errorf("imageTag(%q) = %q, want %q", in, got, want)
}
}
}
func TestSingleflight_Collapses(t *testing.T) { func TestSingleflight_Collapses(t *testing.T) {
var g singleflightGroup var g singleflightGroup
const N = 50 const N = 50
@ -305,113 +301,3 @@ func contains(haystack []string, needle string) bool {
} }
return false return false
} }
// TestToolSpecPopulation: the convert entry points populate BOTH the
// Image and Binary fields of ToolSpec, so the runner-of-the-day can
// pick whichever it needs. bwrapRunner reads Binary; containerRunner
// reads Image; the call site doesn't know which is installed.
func TestToolSpecPopulation(t *testing.T) {
f := &fakeRunner{resp: []byte("ok")}
InstallRunner(f)
t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:1.0", "docker.io/zenika/alpine-chrome:2.0")
SetBinaries("/opt/bin/pandoc", "/opt/bin/chromium")
t.Cleanup(func() { SetImages("", ""); SetBinaries("", "") })
if _, err := ToDocx(context.Background(), []byte("# x\n"), Metadata{}); err != nil {
t.Fatalf("ToDocx: %v", err)
}
if len(f.tools) != 1 {
t.Fatalf("want 1 tool call, got %d", len(f.tools))
}
got := f.tools[0]
if got.Image != "docker.io/pandoc/latex:1.0" {
t.Errorf("Image = %q, want docker.io/pandoc/latex:1.0", got.Image)
}
if got.Binary != "/opt/bin/pandoc" {
t.Errorf("Binary = %q, want /opt/bin/pandoc", got.Binary)
}
}
// TestBwrapArgs_SandboxFlagsPresent locks in the bwrap argv shape.
// Every conversion must run with these hardening flags — the whole
// point of bwrap-as-default is that the sandbox is built into every
// invocation. A refactor that drops any of them needs to fail this
// test loudly.
func TestBwrapArgs_SandboxFlagsPresent(t *testing.T) {
args, err := buildBwrapArgs("pandoc", nil, []string{"--from=markdown", "--to=docx", "-"})
if err != nil {
t.Fatalf("buildBwrapArgs: %v", err)
}
mustHave := []string{
"--unshare-all", // net + pid + ipc + uts + cgroup
"--unshare-user-try", // user-namespace when kernel allows
"--die-with-parent", // cleanup when zddc-server exits
"--proc", // minimal /proc
"--dev", // minimal /dev
"--tmpfs", // writable /tmp scratch
"--clearenv", // no host env leaks
}
for _, flag := range mustHave {
if !contains(args, flag) {
t.Errorf("bwrap args missing sandbox flag %q: %v", flag, args)
}
}
// /usr must be bind-mounted read-only — that's how the binary
// + its dynamic libs are visible inside the sandbox. The
// "--ro-bind /usr /usr" triple must appear consecutively.
if i := indexOfTriple(args, "--ro-bind", "/usr", "/usr"); i < 0 {
t.Errorf("bwrap args missing --ro-bind /usr /usr: %v", args)
}
// Binary + caller-cmd come last, in order.
last := args[len(args)-4:]
want := []string{"pandoc", "--from=markdown", "--to=docx", "-"}
for i, w := range want {
if last[i] != w {
t.Errorf("trailing args[%d] = %q, want %q", i, last[i], w)
}
}
}
// TestBwrapArgs_MountTranslation: caller "host:target:ro" → bwrap
// "--ro-bind host target"; "host:target:rw" → "--bind host target";
// no mode segment defaults to ro (mirroring containerRunner).
func TestBwrapArgs_MountTranslation(t *testing.T) {
args, err := buildBwrapArgs("pandoc",
[]string{"/host/tpl:/tpl:ro", "/host/pdf:/pdf:rw", "/host/x:/x"},
nil)
if err != nil {
t.Fatalf("buildBwrapArgs: %v", err)
}
if i := indexOfTriple(args, "--ro-bind", "/host/tpl", "/tpl"); i < 0 {
t.Errorf("missing --ro-bind /host/tpl /tpl: %v", args)
}
if i := indexOfTriple(args, "--bind", "/host/pdf", "/pdf"); i < 0 {
t.Errorf("missing --bind /host/pdf /pdf: %v", args)
}
if i := indexOfTriple(args, "--ro-bind", "/host/x", "/x"); i < 0 {
t.Errorf("missing default-ro --ro-bind /host/x /x: %v", args)
}
}
// TestBwrapArgs_RejectsBadMountSpec: a malformed mount string fails
// fast, never reaches exec. Single-segment specs (no target) and
// unknown modes both qualify.
func TestBwrapArgs_RejectsBadMountSpec(t *testing.T) {
for _, bad := range []string{"only-host", "/h:/t:weird", ""} {
if _, err := buildBwrapArgs("pandoc", []string{bad}, nil); err == nil {
t.Errorf("expected error for malformed mount %q", bad)
}
}
}
// indexOfTriple returns the index of `a` in args such that
// args[i:i+3] == {a, b, c}, or -1.
func indexOfTriple(args []string, a, b, c string) int {
for i := 0; i+2 < len(args); i++ {
if args[i] == a && args[i+1] == b && args[i+2] == c {
return i
}
}
return -1
}

View file

@ -11,51 +11,45 @@ import (
"time" "time"
) )
// remoteURL is set by Probe from cfg.ConvertPodmanSocket. Empty means // Capabilities is the snapshot the convert-health endpoint reports
// local mode. // and the convert entry points consult before exec'ing.
var remoteURL atomic.Pointer[string]
// Capabilities is the snapshot of "can we convert right now?". The
// only hard requirement is a container runtime reachable from
// zddc-server — image presence is left to `--pull=missing` at
// conversion time, so a missing image surfaces as a normal
// ConvertError (not a probe failure).
// //
// Mode applies to OCI engines (podman/docker): "local" when the // In the runtime-image model, "Ready" means both binaries
// engine creates containers in the same process as zddc-server, // (pandoc + chromium) are present on PATH. Sandboxing + resource
// "remote" when zddc-server is the client of a podman-system-service // limits live in the wrapper scripts that PATH resolves to — out
// sidecar. The bwrap engine has no mode (always direct exec). // of zddc-server's concern. The probe doesn't try to validate
// those; if the wrapper is broken, the first conversion surfaces
// the failure as a ConvertError with the wrapper's stderr.
type Capabilities struct { type Capabilities struct {
Engine string // "bwrap" | "podman" | "docker" | "" PandocBinary string // resolved path, e.g. /usr/local/bin/pandoc
EngineVer string // first line of "<engine> --version" PandocVersion string // first line of "pandoc --version"
Mode string // "local" or "remote" (OCI engines only) ChromiumBinary string // resolved path, e.g. /usr/local/bin/chromium-browser
RemoteURL string // populated in remote mode (OCI engines only) ChromiumVersion string // first line of "chromium-browser --version"
PandocImage string // resolved pandoc image ref (OCI engines)
ChromiumImage string // resolved chromium image ref (OCI engines)
ProbedAt time.Time ProbedAt time.Time
Err error Err error
} }
// Ready reports whether conversions can be attempted. The first // Ready reports whether conversions can be attempted.
// conversion may still fail if the configured binary or image isn't
// actually present (the runner will surface a clear error from the
// child process's stderr).
func (c Capabilities) Ready() bool { func (c Capabilities) Ready() bool {
return c.Engine != "" && c.Err == nil return c.PandocBinary != "" && c.ChromiumBinary != "" && c.Err == nil
} }
// Reason returns a short human-friendly explanation when Ready() is // Reason returns a short human-friendly explanation when Ready() is
// false. Used as the body of a 503. // false. Used as the body of a 503.
func (c Capabilities) Reason() string { func (c Capabilities) Reason() string {
if c.Engine == "" {
return "no conversion sandbox found (looked for bwrap, podman, docker on PATH)"
}
if c.Err != nil { if c.Err != nil {
if c.Mode == "remote" {
return fmt.Sprintf("podman remote socket unreachable (%s): %s", c.RemoteURL, c.Err.Error())
}
return c.Err.Error() return c.Err.Error()
} }
var missing []string
if c.PandocBinary == "" {
missing = append(missing, "pandoc")
}
if c.ChromiumBinary == "" {
missing = append(missing, "chromium-browser")
}
if len(missing) > 0 {
return fmt.Sprintf("conversion binary not found on PATH: %s — runtime image is missing the conversion toolchain (see zddc/runtime.Containerfile)", strings.Join(missing, ", "))
}
return "unavailable" return "unavailable"
} }
@ -74,187 +68,75 @@ func Available() (Capabilities, bool) {
return *p, p.Ready() return *p, p.Ready()
} }
// SetRemoteURL installs the podman remote socket URL for subsequent // Probe resolves the conversion binaries on PATH and installs the
// Probe / Reprobe calls. Empty means "local mode" (the engine binary // localRunner. Call once at server startup. Returns the captured
// creates containers in the same process). Called from // Capabilities for logging.
// cmd/zddc-server/main.go after flag parsing, before Probe.
func SetRemoteURL(url string) {
s := url
remoteURL.Store(&s)
}
func currentRemoteURL() string {
if p := remoteURL.Load(); p != nil {
return *p
}
return ""
}
// Probe locates the container engine and installs a containerRunner
// as the package default. Call once at server startup. Returns the
// captured Capabilities for logging.
// //
// Engine order: engineOverride (if non-empty) → podman → docker. First // Image responsibility: the binaries on PATH should be the wrapper
// hit wins. Image presence is NOT probed: the runner uses // scripts at /usr/local/bin/{pandoc,chromium-browser} (shipped by
// `--pull=missing` so the first conversion request will pull whichever // zddc/runtime.Containerfile). Each wrapper handles cgroup setup
// image it needs. // + bwrap sandbox + exec of the real binary at /usr/bin/<name>.
// If an operator runs zddc-server outside the runtime image with
// raw pandoc / chromium on PATH, the conversion still works but
// without the per-call sandbox + resource caps.
// //
// In remote mode (SetRemoteURL with non-empty URL), the probe also // Failure here is non-fatal: the server still starts, conversion
// invokes `<engine> --remote --url=<url> version` to confirm the
// sidecar's socket is reachable. A reachable-engine-but-unreachable-
// socket state surfaces as Ready=false so conversion requests serve
// 503 until the sidecar comes up.
//
// Any failure here is non-fatal: the server still starts, conversion
// endpoints just return 503. // endpoints just return 503.
func Probe(ctx context.Context, engineOverride string) Capabilities { func Probe(ctx context.Context) Capabilities {
probeCool.Lock() probeCool.Lock()
defer probeCool.Unlock() defer probeCool.Unlock()
now := time.Now() c := Capabilities{ProbedAt: time.Now()}
rURL := currentRemoteURL()
c := Capabilities{ pandocBin := currentPandocBinary()
PandocImage: currentPandocImage(), chromiumBin := currentChromiumBinary()
ChromiumImage: currentChromiumImage(),
Mode: "local", if p, err := exec.LookPath(pandocBin); err == nil {
RemoteURL: rURL, c.PandocBinary = p
ProbedAt: now, if v, err := probeVersion(ctx, p); err == nil {
c.PandocVersion = v
}
}
if p, err := exec.LookPath(chromiumBin); err == nil {
c.ChromiumBinary = p
if v, err := probeVersion(ctx, p); err == nil {
c.ChromiumVersion = v
} }
if rURL != "" {
c.Mode = "remote"
} }
enginePath := resolveEngine(engineOverride) if c.PandocBinary == "" || c.ChromiumBinary == "" {
if enginePath == "" { c.Err = fmt.Errorf("%s", c.Reason())
c.Err = fmt.Errorf("no conversion sandbox found (tried: %s)", strings.Join(enginesTried(engineOverride), ", "))
caps.Store(&c) caps.Store(&c)
slog.Warn("convert: probe failed", "reason", c.Err.Error()) slog.Warn("convert: probe failed", "reason", c.Err.Error())
return c return c
} }
kind := engineKind(enginePath)
c.Engine = kind
if v, err := probeVersion(ctx, enginePath); err == nil { InstallRunner(newLocalRunner())
c.EngineVer = v
}
// bwrap engine: no remote-mode concept, just install the runner.
// The bwrap binary IS the sandbox; conversion binaries (pandoc,
// chromium) are resolved separately from PATH at call time and
// reported by the convert-health endpoint when ready.
if kind == "bwrap" {
InstallRunner(newBwrapRunner(enginePath))
caps.Store(&c) caps.Store(&c)
slog.Info("convert: ready", slog.Info("convert: ready",
"engine", kind, "pandoc_binary", c.PandocBinary,
"engine_path", enginePath, "pandoc_version", c.PandocVersion,
"engine_version", c.EngineVer, "chromium_binary", c.ChromiumBinary,
"pandoc_binary", currentPandocBinary(), "chromium_version", c.ChromiumVersion)
"chromium_binary", currentChromiumBinary())
return c
}
// Legacy OCI engine (podman/docker). Optional remote-socket
// connectivity check, then install containerRunner.
if rURL != "" {
if err := probeRemoteSocket(ctx, enginePath, rURL); err != nil {
c.Err = err
caps.Store(&c)
slog.Warn("convert: remote socket probe failed",
"engine", kind, "remote_url", rURL, "err", err)
return c
}
}
InstallRunner(newContainerRunner(enginePath, rURL))
caps.Store(&c)
slog.Info("convert: ready",
"engine", kind,
"engine_path", enginePath,
"engine_version", c.EngineVer,
"mode", c.Mode,
"remote_url", c.RemoteURL,
"pandoc_image", c.PandocImage,
"chromium_image", c.ChromiumImage)
return c return c
} }
// probeRemoteSocket runs `<engine> --remote --url=<url> version` with // Reprobe re-runs Probe with the existing configuration. Used by
// a short timeout. Returns nil on success; a wrapped error otherwise. // the handler when a request hits a not-Ready state — gives the
// The remote URL is typically a Unix socket path // operator a way to recover (e.g. installed pandoc after server
// (unix:///var/run/podman/podman.sock) in the sidecar pattern but a // start) without a server restart. Cooldown of 60 s between probes
// TCP form (tcp://host:port) is accepted too. // to keep error-path requests cheap.
func probeRemoteSocket(ctx context.Context, engine, url string) error { func Reprobe(ctx context.Context) Capabilities {
c := exec.CommandContext(ctx, engine, "--remote", "--url="+url, "version", "--format={{.Client.Version}}")
out, err := c.CombinedOutput()
if err != nil {
return fmt.Errorf("podman --remote version: %w (output: %s)", err, strings.TrimSpace(string(out)))
}
return nil
}
// Reprobe re-runs Probe with the existing configuration. Used by the
// handler when a request hits a not-Ready state — gives the operator
// a way to recover (e.g. installed podman after the server started)
// without a server restart. Cooldown of 60 s between probes to keep
// error-path requests cheap.
func Reprobe(ctx context.Context, engineOverride string) Capabilities {
if p := caps.Load(); p != nil { if p := caps.Load(); p != nil {
if time.Since(p.ProbedAt) < 60*time.Second { if time.Since(p.ProbedAt) < 60*time.Second {
return *p return *p
} }
} }
return Probe(ctx, engineOverride) return Probe(ctx)
} }
func resolveEngine(override string) string { func probeVersion(ctx context.Context, binary string) (string, error) {
if override != "" { c := exec.CommandContext(ctx, binary, "--version")
if p, err := exec.LookPath(override); err == nil {
return p
}
return ""
}
// Probe order: bwrap (production default — lightest sandbox, no
// daemon, no OCI engine), then podman / docker as legacy fallbacks
// for hosts that already have a container engine and want OCI-image
// isolation per conversion.
for _, name := range []string{"bwrap", "podman", "docker"} {
if p, err := exec.LookPath(name); err == nil {
return p
}
}
return ""
}
func enginesTried(override string) []string {
if override != "" {
return []string{override}
}
return []string{"bwrap", "podman", "docker"}
}
// engineKind returns the engine-family label for a resolved binary
// path. "bwrap" is its own engine; "podman" and "docker" are the
// OCI-container engines handled by containerRunner. Used by Probe to
// pick the right Runner implementation.
func engineKind(resolved string) string {
base := resolved
if i := strings.LastIndex(base, "/"); i >= 0 {
base = base[i+1:]
}
switch base {
case "bwrap":
return "bwrap"
case "podman", "podman-remote":
return "podman"
case "docker":
return "docker"
}
return base
}
func probeVersion(ctx context.Context, engine string) (string, error) {
c := exec.CommandContext(ctx, engine, "--version")
out, err := c.CombinedOutput() out, err := c.CombinedOutput()
if err != nil { if err != nil {
return "", err return "", err

View file

@ -10,60 +10,45 @@ import (
"os" "os"
"os/exec" "os/exec"
"path/filepath" "path/filepath"
"strings"
"sync" "sync"
"time" "time"
) )
// ToolSpec identifies the conversion tool to invoke. Runners pick // Runner executes a conversion binary and returns its stdout. The
// whichever field applies to them: // production implementation (localRunner) just exec's the binary
// directly. Tests use a fake.
// //
// - bwrapRunner uses Binary — the path or PATH-name of the tool on // binary is the PATH-resolvable name (or absolute path) of the
// the zddc-server host (or container). pandoc/latex's entrypoint // conversion tool — typically "pandoc" or "chromium-browser". In the
// becomes `pandoc`; alpine-chrome's becomes `chromium-browser`. // production runtime image those names resolve to wrapper scripts at
// This is the production-default engine: lightest sandbox, no // /usr/local/bin/ that put the real binary into a cgroup + bwrap
// daemon, no privileged outer container. // sandbox before exec'ing it. From zddc-server's perspective, that
// indirection is invisible: it just sees pandoc behavior.
// //
// - containerRunner uses Image — the OCI image ref pulled into a // stdin is piped to the binary's stdin. scratchDir is an optional
// fresh container for each conversion (legacy/fallback engine, // host directory the binary needs to read from / write to (template
// kept for environments that already host a podman/docker daemon // + intermediate HTML + PDF output); passed to the child via the
// and want OCI-image isolation per conversion). // ZDDC_SCRATCH env var, which the wrapper script bind-mounts into
// the sandbox at the same path. Empty means "no scratch dir
// needed" (DOCX flow — stdin to stdout, no files).
// //
// Both fields are populated by the entry points in convert.go so a // cmd is the argv passed to the binary. Same shape across all
// single call site works regardless of which engine is installed. // runners; no shell quoting; no engine-specific flags.
type ToolSpec struct {
Image string // OCI image ref (containerRunner)
Binary string // binary name on PATH (bwrapRunner)
}
// Runner executes a conversion sub-process and returns its stdout.
// The host-side implementations are bwrapRunner (default; wraps
// `bubblewrap`) and containerRunner (fallback; wraps `podman run` /
// `docker run`). Tests use a fake.
// //
// stdin is piped to the tool's stdin. cmd is the argv passed *to the // All exec calls in this package go through Runner.Run.
// tool* — for pandoc the entrypoint accepts pandoc flags directly;
// for chromium it accepts chromium-browser flags. mounts is a list
// of "<hostPath>:<targetPath>" specs (":ro" is added if no mode
// segment is present); each runner translates them to its own
// bind/--volume syntax.
//
// All exec calls in this package go through Runner.Run. This is the
// first os/exec site in the codebase; the hardening here is the
// pattern for future shell-outs.
type Runner interface { type Runner interface {
Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) Run(ctx context.Context, binary string, stdin []byte, scratchDir string, cmd []string) ([]byte, error)
} }
// ErrUnavailable means no container runtime is present on the host. // ErrUnavailable means the conversion binary couldn't be found on
// Handlers translate to HTTP 503. // PATH. Handlers translate to HTTP 503.
var ErrUnavailable = errors.New("conversion unavailable") var ErrUnavailable = errors.New("conversion unavailable")
// ConvertError carries the failure surface from a non-zero exit. // ConvertError carries the failure surface from a non-zero exit.
// Stderr is captured (truncated to 4 KiB by the runner) so callers can // Stderr is captured (truncated to 4 KiB by the runner) so callers
// surface pandoc/chromium's own complaint. // can surface the binary's own complaint.
type ConvertError struct { type ConvertError struct {
Tool string // image name fragment, used only for logging Tool string // binary name, used only for logging
ExitCode int ExitCode int
Stderr string Stderr string
Cause error Cause error
@ -74,78 +59,154 @@ func (e *ConvertError) Error() string {
return "<nil>" return "<nil>"
} }
if e.Stderr != "" { if e.Stderr != "" {
return fmt.Sprintf("%s exit %d: %s", e.Tool, e.ExitCode, strings.TrimSpace(e.Stderr)) return fmt.Sprintf("%s exit %d: %s", e.Tool, e.ExitCode, e.Stderr)
} }
return fmt.Sprintf("%s exit %d: %v", e.Tool, e.ExitCode, e.Cause) return fmt.Sprintf("%s exit %d: %v", e.Tool, e.ExitCode, e.Cause)
} }
func (e *ConvertError) Unwrap() error { return e.Cause } func (e *ConvertError) Unwrap() error { return e.Cause }
// containerRunner runs each conversion inside a fresh container. // localRunner exec's the conversion binary directly. The runtime
// The engine ("podman" preferred, "docker" fallback) is resolved once // image's wrapper script (at /usr/local/bin/<binary>) handles
// at startup by Probe. Resource limits are configurable via // sandboxing + resource limits BETWEEN this exec and the real
// SetLimits (called from main.go after flag parsing). Images are passed // binary — invisible to this Runner.
// per call so the same runner handles both pandoc and chromium
// invocations.
// //
// Two modes: // Resource limits stored here are advisory only; the wrapper reads
// // them via env (ZDDC_CONV_MEM_MAX, ZDDC_CONV_PIDS_MAX) and applies
// - **local** (remoteURL=""): the engine binary creates containers // them to its transient cgroup. Wall-clock timeout IS enforced
// directly on the host that runs zddc-server. Used for bare-metal // here via context.WithTimeout.
// and host-podman deployments. Requires podman or docker on PATH. type localRunner struct {
//
// - **remote** (remoteURL="unix:///var/run/podman/podman.sock" or
// similar): the engine binary is the local podman CLIENT, invoked
// as `podman --remote --url=<remoteURL> run …`; the actual
// container creation happens in whatever process owns the socket
// (typically a `podman system service` sidecar in the same pod).
// Used for the Kubernetes sidecar pattern so zddc-server's own
// pod stays unprivileged. Bind-mount paths must resolve identically
// on both sides — see scratchDir.
//
// The runner relies on `--pull=missing` so the operator never has to
// pre-pull images: the first request that needs an image pulls it,
// subsequent requests use the local cache. Both podman and docker
// honour this flag identically.
type containerRunner struct {
mu sync.RWMutex mu sync.RWMutex
engine string
remoteURL string
memMiB int memMiB int
cpus string
pids int pids int
timeout time.Duration timeout time.Duration
} }
func newLocalRunner() *localRunner {
return &localRunner{
memMiB: 1024, // 1 GiB — matches the wrapper's default
pids: 256,
timeout: 60 * time.Second,
}
}
// SetLimits updates the resource ceilings advertised to the wrapper
// script via env vars + the wall-clock timeout enforced here.
// Zero values keep the previous setting (or constructor defaults).
// Safe to call from multiple goroutines.
func (lr *localRunner) SetLimits(memMiB int, pids int, timeout time.Duration) {
lr.mu.Lock()
defer lr.mu.Unlock()
if memMiB > 0 {
lr.memMiB = memMiB
}
if pids > 0 {
lr.pids = pids
}
if timeout > 0 {
lr.timeout = timeout
}
}
func (lr *localRunner) Run(ctx context.Context, binary string, stdin []byte, scratchDir string, cmd []string) ([]byte, error) {
lr.mu.RLock()
memMiB := lr.memMiB
pids := lr.pids
timeout := lr.timeout
lr.mu.RUnlock()
if binary == "" {
return nil, ErrUnavailable
}
runCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
c := exec.CommandContext(runCtx, binary, cmd...)
c.Cancel = func() error {
if c.Process == nil {
return nil
}
return c.Process.Kill()
}
c.WaitDelay = 2 * time.Second
c.SysProcAttr = sysProcAttr()
// Minimal env passed to the wrapper. The wrapper does
// --clearenv inside the bwrap sandbox so the real binary
// sees only what bwrap re-injects (HOME, PATH, LANG). These
// vars are read by the WRAPPER itself, not the binary, to
// drive its cgroup setup + scratch-dir bind mount.
env := []string{
"PATH=" + os.Getenv("PATH"),
"HOME=" + os.TempDir(),
fmt.Sprintf("ZDDC_CONV_MEM_MAX=%dM", memMiB),
fmt.Sprintf("ZDDC_CONV_PIDS_MAX=%d", pids),
}
if scratchDir != "" {
env = append(env, "ZDDC_SCRATCH="+scratchDir)
}
c.Env = env
c.Stdin = bytes.NewReader(stdin)
var stdoutBuf bytes.Buffer
c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20}
stderr := newRingWriter(4 << 10)
c.Stderr = stderr
if err := c.Run(); err != nil {
exitCode := -1
if ee, ok := err.(*exec.ExitError); ok {
exitCode = ee.ExitCode()
}
if runCtx.Err() == context.DeadlineExceeded {
return nil, &ConvertError{
Tool: binary,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()),
}
}
return nil, &ConvertError{
Tool: binary,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: err,
}
}
return stdoutBuf.Bytes(), nil
}
var ( var (
// shared default runner, populated by InstallRunner (called from // shared default runner, populated by InstallRunner (called from
// the health probe at startup once the engine is known). // the health probe at startup once the binaries are confirmed).
defaultRunnerMu sync.RWMutex defaultRunnerMu sync.RWMutex
defaultRunner Runner defaultRunner Runner
) )
// InstallRunner sets the package-level Runner used by ToDocx/ToHTML/ToPDF. // InstallRunner sets the package-level Runner used by ToDocx/ToHTML/
// Tests inject a fake; production code lets the health probe install a // ToPDF. Tests inject a fake; production code lets the health probe
// containerRunner. Safe to call from multiple goroutines. // install a localRunner. Safe to call from multiple goroutines.
func InstallRunner(r Runner) { func InstallRunner(r Runner) {
defaultRunnerMu.Lock() defaultRunnerMu.Lock()
defaultRunner = r defaultRunner = r
defaultRunnerMu.Unlock() defaultRunnerMu.Unlock()
} }
// ConfigureLimits applies resource limits to the package-level Runner, // ConfigureLimits applies resource limits to the package-level
// if it's a containerRunner. No-op when no runner is installed yet // Runner, if it's a localRunner. No-op when no runner is installed
// (the probe failed) or when the installed runner doesn't accept // yet (the probe failed) or when the installed runner doesn't accept
// limits (e.g. a test fake). Zero values keep the previous setting. // limits (e.g. a test fake). Zero values keep the previous setting.
// //
// Called from cmd/zddc-server/main.go after Probe so the limits from // Called from cmd/zddc-server/main.go after Probe so the limits
// the operator's flags take effect before any conversion request lands. // from the operator's flags take effect before any conversion
func ConfigureLimits(memMiB int, cpus string, pids int, timeout time.Duration) { // request lands.
func ConfigureLimits(memMiB int, pids int, timeout time.Duration) {
defaultRunnerMu.RLock() defaultRunnerMu.RLock()
r := defaultRunner r := defaultRunner
defaultRunnerMu.RUnlock() defaultRunnerMu.RUnlock()
if cr, ok := r.(*containerRunner); ok { if lr, ok := r.(*localRunner); ok {
cr.SetLimits(memMiB, cpus, pids, timeout) lr.SetLimits(memMiB, pids, timeout)
} }
} }
@ -156,428 +217,8 @@ func currentRunner() Runner {
return r return r
} }
// SetLimits updates the resource ceilings used for subsequent Run // limitWriter caps the underlying buffer at max bytes. Writes past
// invocations. Zero values keep the previous setting (or the defaults // the cap return an error which surfaces as a Run() error — the
// set at construction). Safe to call from multiple goroutines.
func (cr *containerRunner) SetLimits(memMiB int, cpus string, pids int, timeout time.Duration) {
cr.mu.Lock()
defer cr.mu.Unlock()
if memMiB > 0 {
cr.memMiB = memMiB
}
if cpus != "" {
cr.cpus = cpus
}
if pids > 0 {
cr.pids = pids
}
if timeout > 0 {
cr.timeout = timeout
}
}
func newContainerRunner(engine, remoteURL string) *containerRunner {
return &containerRunner{
engine: engine,
remoteURL: remoteURL,
memMiB: 512,
cpus: "2",
pids: 100,
timeout: 30 * time.Second,
}
}
// Run executes one container invocation. cmd is the argv passed to the
// image's entrypoint (pandoc for pandoc/latex, chromium-browser for
// alpine-chrome). mounts is a list of "<hostPath>:<containerPath>"
// strings; ":ro" is appended when no mode segment is present. stdin is
// piped to the container, stdout is returned as bytes (capped at
// 128 MiB).
//
// Hardening:
// - --pull=missing: image is fetched on first use, cached after.
// Operator only needs podman/docker installed; no manual pull.
// - --rm: container is removed on exit, even if killed.
// - --network=none: no network inside the container. Prevents data
// exfiltration through embedded URLs in source documents.
// - --read-only + tmpfs on /tmp and /run: image fs is immutable;
// pandoc/chromium scratch goes to tmpfs only.
// - --memory / --cpus / --pids-limit: kernel-enforced caps.
// - --cap-drop=ALL + --security-opt=no-new-privileges: standard
// container-escape hardening.
// - context-cancel kill + WaitDelay: a wedged podman gets force-
// killed; pipes drop after 2s so we don't leak goroutines.
// - cmd.Env minimal: only PATH + HOME are passed through to the
// engine binary; the container itself sees only what the image
// bakes in plus what --env adds (HOME=/tmp).
//
// Note: --user is intentionally NOT set so each image uses its
// default user (pandoc/latex runs as root, alpine-chrome runs as
// uid 1000). With --read-only + tmpfs + --cap-drop=ALL +
// --network=none + --no-new-privileges the additional defense from
// forcing nobody is small and would break alpine-chrome's own
// user-data-dir layout.
func (cr *containerRunner) Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
cr.mu.RLock()
engine := cr.engine
remoteURL := cr.remoteURL
memMiB := cr.memMiB
cpus := cr.cpus
pids := cr.pids
timeout := cr.timeout
cr.mu.RUnlock()
if engine == "" {
return nil, ErrUnavailable
}
image := tool.Image
if image == "" {
return nil, fmt.Errorf("convert.Run: tool.Image is empty (containerRunner requires an OCI image ref)")
}
runCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
// Client args. In remote mode, prepend --remote and --url so the
// podman CLI dispatches the request to the sidecar's
// `podman system service` instead of creating a container locally.
// The remaining flags (--rm, --pull=missing, etc.) apply to the
// container that the remote daemon will create — same wire format
// as local mode.
var args []string
if remoteURL != "" {
args = append(args, "--remote", "--url="+remoteURL)
}
args = append(args,
"run",
"--rm",
"--pull=missing",
"-i",
)
// --userns=host only in local mode: needed when zddc-server itself
// is the one running podman inside a Kubernetes pod, because the
// kernel won't let an inner rootless podman set up its own userns
// via newuidmap. In remote (sidecar) mode the sidecar runs as root
// and creates the inner container in its own (rootful) namespace,
// so --userns=host is unnecessary and potentially noisy.
if remoteURL == "" {
args = append(args, "--userns=host")
}
args = append(args,
"--network=none",
"--read-only",
// /tmp must be large enough to host chromium's shared-memory
// fallback (--disable-dev-shm-usage redirects /dev/shm writes
// here) plus the user-data-dir. 256 MiB is plenty for the
// HTML→PDF flow; pandoc itself uses almost none.
"--tmpfs=/tmp:size=256m,exec",
"--tmpfs=/run:size=4m",
fmt.Sprintf("--memory=%dm", memMiB),
fmt.Sprintf("--cpus=%s", cpus),
fmt.Sprintf("--pids-limit=%d", pids),
"--cap-drop=ALL",
"--security-opt=no-new-privileges",
"--env=HOME=/tmp",
"--workdir=/tmp",
)
for _, m := range mounts {
if !strings.Contains(m, ":ro") && !strings.Contains(m, ":rw") {
m += ":ro"
}
args = append(args, "--volume="+m)
}
args = append(args, image)
args = append(args, cmd...)
c := exec.CommandContext(runCtx, engine, args...)
c.Cancel = func() error {
if c.Process == nil {
return nil
}
return c.Process.Kill()
}
c.WaitDelay = 2 * time.Second
c.SysProcAttr = sysProcAttr()
c.Env = []string{
"PATH=" + os.Getenv("PATH"),
"HOME=" + os.TempDir(),
}
c.Stdin = bytes.NewReader(stdin)
var stdoutBuf bytes.Buffer
c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20}
stderr := newRingWriter(4 << 10)
c.Stderr = stderr
err := c.Run()
if err != nil {
exitCode := -1
if ee, ok := err.(*exec.ExitError); ok {
exitCode = ee.ExitCode()
}
toolName := imageTag(image)
if runCtx.Err() == context.DeadlineExceeded {
return nil, &ConvertError{
Tool: toolName,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()),
}
}
return nil, &ConvertError{
Tool: toolName,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: err,
}
}
return stdoutBuf.Bytes(), nil
}
// ───────────────────────────────────────────────────────────────────────────
// bwrapRunner — default conversion engine.
//
// Wraps `bubblewrap` to run pandoc / chromium binaries directly in a
// per-call Linux-namespace sandbox. No daemon, no OCI images, no
// privileged outer container. Image-build bundles pandoc + chromium
// into the zddc-server image so the binaries are available on PATH;
// each conversion gets a fresh set of namespaces, a read-only view
// of the host's /usr (so the binary + its libs are visible), a tmpfs
// /tmp, and nothing else.
//
// This matches the threat model of the legacy containerRunner —
// untrusted source-markdown drives the binary, we contain any
// resulting RCE inside the bwrap sandbox — without the operational
// tax of running a container engine per conversion (image pull,
// daemon, socket, ~300ms startup).
//
// Hardening (mirror of containerRunner's flags):
// - --unshare-all + --share-net=off via omission → no network
// - --unshare-user-try → user namespace when kernel allows it
// - --die-with-parent → cleanup on zddc-server exit
// - --ro-bind /usr /usr, /lib /lib, /lib64 /lib64, /etc /etc, /bin /bin
// (where present) → tools + libs visible read-only
// - --proc /proc, --dev /dev → minimal pseudo-filesystems
// - --tmpfs /tmp (256 MiB) → scratch space, matches container path
// - --chdir /tmp → workdir
// - --clearenv + minimal HOME/PATH/LANG → no host env leaks
// - --cap-drop ALL (bwrap default, explicit for clarity)
// ───────────────────────────────────────────────────────────────────────────
type bwrapRunner struct {
mu sync.RWMutex
bin string // path to bwrap binary
memMiB int // currently advisory; bwrap has no built-in cap
cpus string // currently advisory
pids int // currently advisory
timeout time.Duration // context deadline per Run
}
func newBwrapRunner(bin string) *bwrapRunner {
return &bwrapRunner{
bin: bin,
memMiB: 512,
cpus: "2",
pids: 100,
timeout: 30 * time.Second,
}
}
// SetLimits — same shape as containerRunner.SetLimits. bwrap itself
// doesn't enforce cgroup limits; we capture the values so an operator
// can read them back via /.profile/config or the convert-health probe.
// Wrapping with systemd-run --scope --property MemoryMax=… is the
// follow-up if hard caps are needed; not in this iteration.
func (br *bwrapRunner) SetLimits(memMiB int, cpus string, pids int, timeout time.Duration) {
br.mu.Lock()
defer br.mu.Unlock()
if memMiB > 0 {
br.memMiB = memMiB
}
if cpus != "" {
br.cpus = cpus
}
if pids > 0 {
br.pids = pids
}
if timeout > 0 {
br.timeout = timeout
}
}
func (br *bwrapRunner) Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
br.mu.RLock()
bwrapBin := br.bin
timeout := br.timeout
br.mu.RUnlock()
if bwrapBin == "" {
return nil, ErrUnavailable
}
if tool.Binary == "" {
return nil, fmt.Errorf("convert.Run: tool.Binary is empty (bwrapRunner requires a host-binary name)")
}
runCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
args, err := buildBwrapArgs(tool.Binary, mounts, cmd)
if err != nil {
return nil, err
}
c := exec.CommandContext(runCtx, bwrapBin, args...)
c.Cancel = func() error {
if c.Process == nil {
return nil
}
return c.Process.Kill()
}
c.WaitDelay = 2 * time.Second
c.SysProcAttr = sysProcAttr()
c.Env = []string{
"PATH=" + os.Getenv("PATH"),
"HOME=" + os.TempDir(),
}
c.Stdin = bytes.NewReader(stdin)
var stdoutBuf bytes.Buffer
c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20}
stderr := newRingWriter(4 << 10)
c.Stderr = stderr
if runErr := c.Run(); runErr != nil {
exitCode := -1
if ee, ok := runErr.(*exec.ExitError); ok {
exitCode = ee.ExitCode()
}
toolName := tool.Binary
if runCtx.Err() == context.DeadlineExceeded {
return nil, &ConvertError{
Tool: toolName,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()),
}
}
return nil, &ConvertError{
Tool: toolName,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: runErr,
}
}
return stdoutBuf.Bytes(), nil
}
// buildBwrapArgs assembles the bwrap argv for a single conversion.
// Exposed as a package-internal helper so tests can lock the sandbox
// flag shape without exec'ing bwrap. Returns an error when a mount
// spec is malformed.
func buildBwrapArgs(binary string, mounts, cmd []string) ([]string, error) {
args := []string{
// Namespace isolation. --unshare-all unshares user (when
// available), ipc, pid, net, uts, cgroup; --unshare-user-try
// downgrades cleanly when the kernel refuses (e.g. some
// container hosts disable user-namespace creation).
"--unshare-all",
"--unshare-user-try",
"--die-with-parent",
// Read-only system view. Each --ro-bind only mounts paths
// that exist on the host; for hosts where /lib is a symlink
// into /usr/lib (modern Linux) the symlink resolution lets
// bwrap mount /usr's contents through.
"--ro-bind", "/usr", "/usr",
"--ro-bind-try", "/lib", "/lib",
"--ro-bind-try", "/lib64", "/lib64",
"--ro-bind-try", "/bin", "/bin",
"--ro-bind-try", "/sbin", "/sbin",
"--ro-bind-try", "/etc", "/etc",
// Pseudo-filesystems. /proc and /dev are required for any
// non-trivial binary; we make them minimal.
"--proc", "/proc",
"--dev", "/dev",
// Scratch. 256 MiB tmpfs at /tmp matches containerRunner.
// chromium spills its shared-memory fallback (--disable-dev-
// shm-usage) here, so the budget actually matters.
"--tmpfs", "/tmp",
"--size", "268435456", // 256 MiB; applies to the most recent --tmpfs
"--chdir", "/tmp",
// Minimal env. HOME=/tmp lets chromium write its
// user-data-dir without permission errors; PATH covers the
// usual install locations for pandoc + chromium across
// alpine / debian / rhel.
"--clearenv",
"--setenv", "HOME", "/tmp",
"--setenv", "PATH", "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"--setenv", "LANG", "C.UTF-8",
}
// Caller-supplied bind mounts (template, output, …). Same
// "host:target[:ro|:rw]" syntax as containerRunner; we translate
// to bwrap's --ro-bind / --bind.
for _, m := range mounts {
host, target, mode, ok := splitMount(m)
if !ok {
return nil, fmt.Errorf("convert.Run: invalid mount spec %q (want host:target[:ro|:rw])", m)
}
if mode == "rw" {
args = append(args, "--bind", host, target)
} else {
args = append(args, "--ro-bind", host, target)
}
}
// Finally the binary + its argv. The binary path is PATH-resolved
// inside the sandbox via the constructed PATH above; if the
// operator passed an absolute path it bypasses PATH lookup and is
// invoked verbatim (still subject to the /usr bind mount).
args = append(args, binary)
args = append(args, cmd...)
return args, nil
}
// splitMount parses "host:target[:ro|:rw]" into its three parts.
// The mode segment is optional; absent means read-only (matches the
// containerRunner default).
func splitMount(m string) (host, target, mode string, ok bool) {
parts := strings.SplitN(m, ":", 3)
if len(parts) < 2 {
return "", "", "", false
}
host = parts[0]
target = parts[1]
mode = "ro"
if len(parts) == 3 {
switch parts[2] {
case "ro", "rw":
mode = parts[2]
default:
return "", "", "", false
}
}
return host, target, mode, true
}
// imageTag extracts a short name for an image reference, used as the
// "Tool" label on ConvertError. "docker.io/pandoc/latex:latest" →
// "pandoc/latex".
func imageTag(image string) string {
s := image
// Strip registry prefix.
if i := strings.Index(s, "/"); i >= 0 {
if strings.Contains(s[:i], ".") || strings.Contains(s[:i], ":") {
s = s[i+1:]
}
}
// Strip tag suffix.
if i := strings.LastIndex(s, ":"); i >= 0 {
s = s[:i]
}
return s
}
// limitWriter caps the underlying buffer at max bytes. Writes past the
// cap return io.ErrShortWrite, which surfaces as a Run() error — the
// caller then maps to 422 (output too large) at the handler edge. // caller then maps to 422 (output too large) at the handler edge.
type limitWriter struct { type limitWriter struct {
w io.Writer w io.Writer
@ -600,9 +241,9 @@ func (l *limitWriter) Write(p []byte) (int, error) {
return n, err return n, err
} }
// ringWriter keeps only the tail of what's written — useful for stderr // ringWriter keeps only the tail of what's written — useful for
// capture where the most-recent bytes are the ones with the actual // stderr capture where the most-recent bytes carry the actual error
// error message and earlier output is usually progress noise. // message and earlier output is usually progress noise.
type ringWriter struct { type ringWriter struct {
mu sync.Mutex mu sync.Mutex
buf []byte buf []byte
@ -636,16 +277,14 @@ func (r *ringWriter) String() string {
// writeAssetsToScratch materialises the embedded viewer-template.html // writeAssetsToScratch materialises the embedded viewer-template.html
// and custom.css into a fresh scratch dir and returns the host path. // and custom.css into a fresh scratch dir and returns the host path.
// Caller is responsible for os.RemoveAll(dir) when done. Used by // Caller is responsible for os.RemoveAll(dir) when done. Used by
// ToHTML which needs the template visible inside the container. // ToHTML which needs the template visible inside the sandbox.
// //
// scratchRoot controls where the temp dir lands. Empty means "use // scratchRoot controls where the temp dir lands. Empty means
// $TMPDIR" (local mode default). In remote/sidecar mode the caller // "use $TMPDIR".
// passes the shared mount path (e.g. "/work") so the podman-service
// sidecar sees the bind-mount source at the same path.
// //
// Files are written world-readable so the container's default user // Files are written world-readable so the binary's default user can
// (root for pandoc/latex, uid 1000 for alpine-chrome) can read them // read them through the wrapper's bind mount regardless of the
// through the read-only bind mount regardless of the host's umask. // host's umask.
func writeAssetsToScratch(scratchRoot string) (string, error) { func writeAssetsToScratch(scratchRoot string) (string, error) {
dir, err := os.MkdirTemp(scratchRoot, "zddc-convert-") dir, err := os.MkdirTemp(scratchRoot, "zddc-convert-")
if err != nil { if err != nil {

View file

@ -97,7 +97,7 @@ func ServeConverted(cfg config.Config, w http.ResponseWriter, r *http.Request, s
if !ok { if !ok {
// One re-probe attempt — gives the operator a way to recover // One re-probe attempt — gives the operator a way to recover
// after building the image without restarting the server. // after building the image without restarting the server.
caps = convert.Reprobe(r.Context(), os.Getenv("ZDDC_CONVERT_ENGINE")) caps = convert.Reprobe(r.Context())
if !caps.Ready() { if !caps.Ready() {
w.Header().Set("Retry-After", "60") w.Header().Set("Retry-After", "60")
http.Error(w, "Service Unavailable — "+caps.Reason(), http.StatusServiceUnavailable) http.Error(w, "Service Unavailable — "+caps.Reason(), http.StatusServiceUnavailable)

View file

@ -1511,7 +1511,7 @@ body.is-elevated::after {
</svg> </svg>
<div class="header-title-group"> <div class="header-title-group">
<span class="app-header__title" id="table-title">ZDDC Table</span> <span class="app-header__title" id="table-title">ZDDC Table</span>
<span class="build-timestamp"><span style="color:red;font-weight:bold">v0.0.17-alpha · 2026-05-19 11:59:55 · 73e34be-dirty</span></span> <span class="build-timestamp"><span style="color:red;font-weight:bold">v0.0.17-alpha · 2026-05-19 12:37:53 · 847e082-dirty</span></span>
</div> </div>
</div> </div>
<div class="header-right"> <div class="header-right">

View file

@ -1,18 +1,30 @@
# Runtime image for zddc-server. # Runtime image for zddc-server.
# #
# Bundles the conversion toolchain (pandoc + chromium + bubblewrap) so # Bundles the conversion toolchain (pandoc + chromium + bubblewrap)
# the MD→DOCX/HTML/PDF endpoint works without an external container # AND two wrapper scripts that shadow the real binaries on PATH.
# engine. The convert package's bwrap engine (production default) # When zddc-server exec's "pandoc" or "chromium-browser", it hits
# sandboxes each pandoc/chromium invocation in a fresh Linux-namespace; # /usr/local/bin/pandoc (a symlink to runtime/zddc-sandbox-exec),
# no daemon, no socket, no privileged outer container, no OCI image # which:
# pull at conversion time. #
# 1. creates a transient cgroup v2 with memory + pids caps,
# 2. drops the process into that cgroup,
# 3. wraps the real binary in a bubblewrap sandbox (private
# namespaces, read-only /usr, fresh tmpfs at /tmp, no network),
# 4. exec's /usr/bin/<name>.
#
# zddc-server's Go code is unaware of any of this — its only contract
# is "if I exec pandoc with these args, I get pandoc behavior." The
# isolation strategy lives entirely in the image; an operator who
# wants firejail / systemd-nspawn / podman-run instead just replaces
# the wrapper script and the binary code keeps working.
# #
# Used by helm charts (helm/zddc-server-prod/) as the main-container # Used by helm charts (helm/zddc-server-prod/) as the main-container
# image. The build is independent of zddc-server itself — the binary # image. The binary is built by the chart's init container from a
# is built by the helm chart's init container from a pinned git ref # pinned git ref and copied into a shared emptyDir; the chart's
# and copied into this runtime image's filesystem at start. Image # command is /usr/local/libexec/zddc-cgroup-init /zddc/zddc-server,
# tags should track the upstream package versions (pandoc, chromium) # so the cgroup v2 hierarchy is delegated before zddc-server starts
# more than zddc-server, since the binary is layered in at deploy time. # (see runtime/zddc-cgroup-init for the "no internal processes"
# constraint that requires this indirection).
# #
# Build: # Build:
# podman build -t zddc-server-runtime:latest \ # podman build -t zddc-server-runtime:latest \
@ -23,8 +35,7 @@
# codeberg.org/varasys/zddc-server-runtime:vYYYYMMDD # codeberg.org/varasys/zddc-server-runtime:vYYYYMMDD
# podman push codeberg.org/varasys/zddc-server-runtime:vYYYYMMDD # podman push codeberg.org/varasys/zddc-server-runtime:vYYYYMMDD
# #
# Size: ≈ 1 GB unpacked (chromium dominates). Container engines # Size: ≈ 1 GB unpacked (chromium dominates).
# layer + dedupe the chromium libs across replicas on the same node.
FROM docker.io/library/alpine:3 FROM docker.io/library/alpine:3
RUN apk add --no-cache \ RUN apk add --no-cache \
@ -34,8 +45,12 @@ RUN apk add --no-cache \
font-noto \ font-noto \
ca-certificates ca-certificates
# The init container in helm/zddc-server-*/templates/deployment.yaml # Wrapper scripts. zddc-cgroup-init runs at container start to
# writes the compiled zddc-server binary to /zddc/zddc-server in a # prepare cgroup v2 subtree_control delegation; zddc-sandbox-exec
# shared emptyDir volume; the main container's command is # is invoked per-conversion via the symlinks below.
# `/zddc/zddc-server`. No CMD/ENTRYPOINT here because the binary COPY runtime/zddc-cgroup-init /usr/local/libexec/zddc-cgroup-init
# path is provided by the chart, not baked into the image. COPY runtime/zddc-sandbox-exec /usr/local/libexec/zddc-sandbox-exec
RUN chmod 0755 /usr/local/libexec/zddc-cgroup-init \
/usr/local/libexec/zddc-sandbox-exec \
&& ln -s /usr/local/libexec/zddc-sandbox-exec /usr/local/bin/pandoc \
&& ln -s /usr/local/libexec/zddc-sandbox-exec /usr/local/bin/chromium-browser

82
zddc/runtime/zddc-cgroup-init Executable file
View file

@ -0,0 +1,82 @@
#!/bin/sh
# zddc-cgroup-init — prepare cgroup v2 hierarchy and exec zddc-server.
#
# The per-conversion wrapper (zddc-sandbox-exec) creates a transient
# child cgroup for each pandoc / chromium invocation, sets memory.max
# and pids.max on it, and moves the conversion process in. That only
# works when:
#
# (a) the cgroup v2 hierarchy is mounted at /sys/fs/cgroup, AND
# (b) the controllers we need (memory, pids) are enabled in the
# parent cgroup's subtree_control file, AND
# (c) the parent cgroup has NO processes in it (cgroup v2's
# "no internal processes" constraint: a cgroup can have
# children OR processes, not both).
#
# A bare container with PID 1 in the root cgroup violates (c). This
# init script does the one-time setup BEFORE exec'ing zddc-server:
#
# 1. mkdir /sys/fs/cgroup/zddc/ (a sibling for zddc-server)
# 2. move every PID out of root into /sys/fs/cgroup/zddc/
# 3. enable +memory +pids in root's subtree_control (now empty)
# 4. enable +memory +pids in zddc/'s subtree_control (so its
# children — the per-conversion cgroups created by the wrapper
# — can use those controllers)
# 5. exec zddc-server (which inherits cgroup membership in zddc/)
#
# After this, the wrapper script creates /sys/fs/cgroup/conv.<pid>/
# as a sibling of /sys/fs/cgroup/zddc/, sets limits, and moves the
# pandoc/chromium process in. Each conversion gets a fresh transient
# cgroup that vanishes when the process exits.
#
# Best-effort: if any step fails (cgroup v1, undelegated subtree,
# read-only cgroupfs in some other container shape), this script
# still exec's zddc-server. The convert pipeline degrades to
# "bwrap sandbox + wall-clock timeout"; an operator notices via
# the warning log line below.
set -eu
setup_cgroup_v2() {
cgroot=/sys/fs/cgroup
[ -d "$cgroot" ] || return 1
# Detect cgroup v2 by the presence of cgroup.controllers at root.
[ -r "$cgroot/cgroup.controllers" ] || return 1
# Need memory + pids in available controllers.
if ! grep -qw memory "$cgroot/cgroup.controllers"; then
echo "zddc-cgroup-init: cgroup.controllers lacks 'memory' — per-conversion memory cap will be unenforced" >&2
fi
# Create the leaf where zddc-server itself will live.
mkdir -p "$cgroot/zddc" || return 1
# Move every PID currently in the root cgroup into zddc/. The
# root must be empty before we can enable subtree_control.
if [ -r "$cgroot/cgroup.procs" ]; then
while read -r pid; do
[ -n "$pid" ] || continue
# Best-effort; processes can exit between read and write.
printf "%s\n" "$pid" > "$cgroot/zddc/cgroup.procs" 2>/dev/null || true
done < "$cgroot/cgroup.procs"
fi
# Enable controllers at root → makes them usable in immediate
# children (zddc/ and any sibling per-conversion cgroup).
printf "+memory +pids" > "$cgroot/cgroup.subtree_control" 2>/dev/null || {
echo "zddc-cgroup-init: could not enable +memory +pids in $cgroot/cgroup.subtree_control — caps will not apply" >&2
return 1
}
# Enable inside zddc/ too, so any deeper children of zddc-server
# (which there shouldn't be, but defense in depth) inherit.
printf "+memory +pids" > "$cgroot/zddc/cgroup.subtree_control" 2>/dev/null || true
return 0
}
if ! setup_cgroup_v2; then
echo "zddc-cgroup-init: cgroup v2 setup unavailable — running without per-conversion caps" >&2
fi
# Hand off to zddc-server. The exec'd process lands in
# /sys/fs/cgroup/zddc/ (we moved ourselves there above). When it
# spawns the wrapper, the wrapper creates a transient sibling cgroup
# under /sys/fs/cgroup/, NOT a child of zddc/, so the conversion's
# cgroup is a peer of zddc-server's — keeping zddc-server's own
# resource accounting separate from conversion accounting.
exec "$@"

118
zddc/runtime/zddc-sandbox-exec Executable file
View file

@ -0,0 +1,118 @@
#!/bin/sh
# zddc-sandbox-exec — drop-in wrapper for pandoc and chromium-browser.
#
# Invoked via symlinks at /usr/local/bin/pandoc and
# /usr/local/bin/chromium-browser. zddc-server (and any other caller
# that uses the default PATH) exec's by short name, hits this script
# first, and we transparently run the real binary at /usr/bin/<name>
# inside:
#
# 1. a transient cgroup v2 (memory + pids cap, kernel-enforced)
# 2. a bubblewrap sandbox (private namespaces, ro-bind /usr, fresh
# tmpfs at /tmp, no network)
#
# zddc-server's Go code does not know about either layer — its only
# contract with the image is "if I exec pandoc with these args, I
# get pandoc behavior back." Swap the wrapper for a different
# isolation strategy (firejail, nspawn, podman-run, raw exec) and
# nothing changes in Go.
#
# Caller-tunable env (with defaults):
#
# ZDDC_SCRATCH host directory to bind-mount read-write
# inside the sandbox at the SAME path. Set by
# zddc-server per-conversion; the markdown
# template, intermediate HTML, and chromium
# output PDF all live there. Absent = no extra
# bind mount; /tmp is a fresh tmpfs only.
# ZDDC_CONV_MEM_MAX cgroup memory.max value (default "1G").
# cgroup v2 syntax — bytes, "1G", or "max".
# ZDDC_CONV_PIDS_MAX cgroup pids.max value (default "256").
# ZDDC_CONV_TMPFS_SIZE bwrap tmpfs /tmp byte size (default 256 MiB).
set -eu
NAME=$(basename "$0")
REAL="/usr/bin/$NAME"
if [ ! -x "$REAL" ]; then
echo "zddc-sandbox-exec: $NAME — real binary not found at $REAL" >&2
exit 127
fi
# ── 1. cgroup v2 (best-effort) ──────────────────────────────────────────
#
# zddc-cgroup-init enables +memory +pids in /sys/fs/cgroup/cgroup.
# subtree_control at container start (see that script for the cgroup
# v2 "no internal processes" wrinkle that requires the indirection).
# Here we just need to mkdir a transient child, set caps, move
# ourselves in. The real binary inherits cgroup membership at exec.
CG_ROOT="/sys/fs/cgroup"
CG_CONTROL="$CG_ROOT/cgroup.subtree_control"
if [ -w "$CG_CONTROL" ] && grep -qw memory "$CG_CONTROL" 2>/dev/null; then
CG="$CG_ROOT/conv.$$"
if mkdir "$CG" 2>/dev/null; then
# rmdir on exit so the cgroupfs doesn't leak. Best-effort:
# the kernel reaps empty cgroups when the last PID leaves
# anyway, but we tidy up for the case where the wrapper
# itself exits before exec'ing the real binary.
trap 'rmdir "$CG" 2>/dev/null || true' EXIT INT TERM
printf "%s\n" "${ZDDC_CONV_MEM_MAX:-1G}" > "$CG/memory.max" 2>/dev/null || true
printf "%s\n" "${ZDDC_CONV_PIDS_MAX:-256}" > "$CG/pids.max" 2>/dev/null || true
printf "%s\n" "$$" > "$CG/cgroup.procs" 2>/dev/null || true
fi
fi
# ── 2. bwrap sandbox ────────────────────────────────────────────────────
#
# Mirror the hardening that internal/convert previously assembled in
# Go: unshare every namespace (--unshare-all also covers network),
# bind /usr read-only so the binary + its libs are visible, drop a
# fresh tmpfs at /tmp, clear the environment to a minimal floor.
#
# Building the bwrap argv preserves "$@" (the original pandoc /
# chromium args) by PREPENDING bwrap flags onto the existing
# positional parameters. Each `set -- new-flag "$@"` puts one flag
# at the front; reads back-to-front the final argv is:
#
# bwrap --unshare-all --unshare-user-try ... -- REAL_BINARY ORIG_ARGS
#
# This is the standard POSIX-sh idiom for "build a command line
# without an array type."
set -- "$REAL" "$@" # REAL ORIG
set -- -- "$@" # -- REAL ORIG
# Optional scratch dir, prepended just before "-- REAL ORIG" so it
# lands inside the bwrap flag list:
if [ -n "${ZDDC_SCRATCH:-}" ] && [ -d "$ZDDC_SCRATCH" ]; then
set -- --bind "$ZDDC_SCRATCH" "$ZDDC_SCRATCH" "$@"
fi
# Common bwrap flags (each one prepended; final order is bottom-up).
set -- --setenv LANG C.UTF-8 "$@"
set -- --setenv PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin "$@"
set -- --setenv HOME /tmp "$@"
set -- --clearenv "$@"
set -- --chdir /tmp "$@"
# bwrap's --size sets the size of the NEXT --tmpfs, so in argv order
# --size must come before --tmpfs. Building bottom-up via prepend means
# the LATER statement here lands earlier in argv: write --tmpfs first
# then --size, so the final $@ starts with "... --size N --tmpfs /tmp".
set -- --tmpfs /tmp "$@"
set -- --size "${ZDDC_CONV_TMPFS_SIZE:-268435456}" "$@"
set -- --dev /dev "$@"
set -- --proc /proc "$@"
set -- --ro-bind-try /etc /etc "$@"
set -- --ro-bind-try /sbin /sbin "$@"
set -- --ro-bind-try /bin /bin "$@"
set -- --ro-bind-try /lib64 /lib64 "$@"
set -- --ro-bind-try /lib /lib "$@"
set -- --ro-bind /usr /usr "$@"
set -- --die-with-parent "$@"
set -- --unshare-user-try "$@"
set -- --unshare-all "$@"
exec bwrap "$@"