feat(convert): bwrap engine as production default
Replaces the always-spawn-an-OCI-container model with a per-call
bubblewrap sandbox. Pandoc and chromium binaries are baked into the
zddc-server runtime image; each conversion runs them under bwrap's
Linux-namespace isolation. No daemon, no socket, no privileged outer
container, no OCI image pull at conversion time.
Why: the OCI engine paid ≈ 350 MB image pulls + 400 MB persistent
storage + ~300 ms per-conversion startup, plus required either an
on-host daemon socket (zddc-RCE → host-RCE in one hop) or nested
container privileges. bwrap gets the same sandbox properties
(--unshare-all, ro-bind /usr, tmpfs /tmp, clearenv, no-network) at
~5 ms per call and zero external dependencies. This is the same
primitive Flatpak uses for every app launch — battle-tested at scale
for "untrusted-input, short-lived, isolated."
Runner abstraction:
- `Runner.Run` signature: image string → ToolSpec{Image, Binary}.
Both fields populated by entry points; whichever engine is
installed reads the one it needs.
- `bwrapRunner` (new): assembles bwrap argv via `buildBwrapArgs`
helper (testable in isolation), spawns bwrap with the binary.
- `containerRunner` (renamed conceptually to "legacy fallback"):
unchanged behavior, still reachable for hosts that prefer OCI
containers per conversion.
Probe order in health.Probe: bwrap → podman → docker. First hit wins.
Engine kinds in Capabilities: "bwrap" | "podman" | "docker". The
no-engine error message now lists all three.
Config (cmd/zddc-server):
- new --convert-pandoc-binary / ZDDC_CONVERT_PANDOC_BINARY (default "pandoc")
- new --convert-chromium-binary / ZDDC_CONVERT_CHROMIUM_BINARY (default "chromium-browser")
- existing --convert-pandoc-image / --convert-chromium-image kept
for the OCI engine, doc updated to clarify they only apply there.
- --convert-engine helptext lists bwrap first.
Images:
- New `zddc/runtime.Containerfile` — alpine + bubblewrap + pandoc-cli +
chromium + font-noto. Documents build/publish workflow.
- helm/zddc-server-prod/values.yaml.example: runtimeImage default
switched to a placeholder for the new bundled runtime image; bare
alpine NO LONGER works for /.convert (clearly called out in the
comment).
- bitnest dev: /var/lib/zddc-dev-build/Containerfile mirrors the
production runtime image. Quadlet at /etc/containers/systemd/
zddc.container drops the podman-socket mount (no longer needed)
and sets ZDDC_CONVERT_ENGINE=bwrap explicitly to avoid silent
downgrades if a stray podman ends up on PATH.
Tests:
- convert_test.go: fakeRunner / recordingRunner now record ToolSpec.
- New TestToolSpecPopulation pins that both Image and Binary are
filled by every entry point.
- New TestBwrapArgs_SandboxFlagsPresent / MountTranslation /
RejectsBadMountSpec lock in the bwrap argv shape — a refactor that
drops a hardening flag or misroutes a mount fails this loud.
Docs:
- AGENTS.md § "Server-side document conversion" rewritten around
the bwrap-first model with podman/docker as legacy fallbacks.
- ARCHITECTURE.md convert reference updated.
- internal/convert package doc reflects the two-engine probe order.
Verified end-to-end on bitnest: probe reports
engine=bwrap pandoc_binary=pandoc chromium_binary=chromium-browser
on startup. All 15 Go test packages green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
85e6eb152c
commit
da4754b6ef
11 changed files with 633 additions and 102 deletions
16
AGENTS.md
16
AGENTS.md
|
|
@ -347,15 +347,17 @@ The markdown editor lives at `browse/js/preview-markdown.js` and is mounted as t
|
|||
|
||||
zddc-server can convert `.md` → DOCX/HTML/PDF on demand at `GET /<path>/foo.md?convert=docx|html|pdf`. Implementation:
|
||||
|
||||
- **Two upstream images, pulled on first use.** No custom image build. Operator just needs `podman` or `docker` installed; the runner passes `--pull=missing` so the first request pulls each image and subsequent requests use the local cache.
|
||||
- `docker.io/pandoc/latex:latest` — pandoc's official image, entrypoint `pandoc`. Used for MD → DOCX and MD → HTML. Override via `--convert-pandoc-image=` / `ZDDC_CONVERT_PANDOC_IMAGE` (e.g. switch to `docker.io/pandoc/core:latest` for a ~90% size reduction).
|
||||
- `docker.io/zenika/alpine-chrome:latest` — Zenika's Alpine + Chromium image, entrypoint `chromium-browser`. Used for HTML → PDF (the PDF flow is two-stage: pandoc image emits HTML using viewer-template.html, chromium image prints it). Override via `--convert-chromium-image=` / `ZDDC_CONVERT_CHROMIUM_IMAGE`.
|
||||
- Engine is podman preferred, docker fallback (`--convert-engine=` / `ZDDC_CONVERT_ENGINE` to override). No host pandoc or chromium needed.
|
||||
- Each conversion runs in a throw-away container with `--rm --pull=missing --network=none --read-only --tmpfs=/tmp:size=128m,exec --memory --cpus --pids-limit --cap-drop=ALL --security-opt=no-new-privileges --env=HOME=/tmp`. Resource caps via `--convert-mem-mib` (default 512), `--convert-cpus` (default "2"), `--convert-pids` (default 100), `--convert-timeout` (default 30s). `--user` is intentionally not set so each image uses its default (root for pandoc/latex, uid 1000 for alpine-chrome) — the other flags already provide strong isolation and overriding the user would break alpine-chrome's user-data-dir layout.
|
||||
- I/O via bind mount + stdin/stdout. Pandoc reads markdown from stdin, writes to stdout. The viewer template is bind-mounted read-only at `/tpl`. Chromium reads HTML from a read-write bind mount at `/pdf` and writes the PDF to the same mount; the host reads it back.
|
||||
- **Two engines, probed bwrap → podman → docker.** The first one found on PATH wins; `--convert-engine=` / `ZDDC_CONVERT_ENGINE` forces a choice.
|
||||
|
||||
- **bwrap (production default).** Wraps `bubblewrap` to run `pandoc` and `chromium-browser` directly in a per-call Linux-namespace sandbox: `--unshare-all --unshare-user-try --die-with-parent --ro-bind /usr /usr ... --proc /proc --dev /dev --tmpfs /tmp --clearenv`. No daemon, no socket, no OCI image pull at conversion time. Binaries are baked into the zddc-server runtime image (`zddc/runtime.Containerfile`) so the operator just runs the image. Configure binary names via `--convert-pandoc-binary` (default `pandoc`) / `--convert-chromium-binary` (default `chromium-browser`; debian/ubuntu installs as `chromium`).
|
||||
|
||||
- **podman / docker (legacy fallback).** Wraps `podman run` / `docker run` with `--rm --pull=missing --network=none --read-only --tmpfs=/tmp:size=256m,exec --memory --cpus --pids-limit --cap-drop=ALL --security-opt=no-new-privileges --env=HOME=/tmp`. Used when the operator wants OCI-image isolation per conversion and already has an engine on PATH. Default images `docker.io/pandoc/latex:latest` (override via `--convert-pandoc-image=` / `ZDDC_CONVERT_PANDOC_IMAGE`) and `docker.io/zenika/alpine-chrome:latest` (override via `--convert-chromium-image=`).
|
||||
|
||||
- Resource caps via `--convert-mem-mib` (default 512), `--convert-cpus` (default "2"), `--convert-pids` (default 100), `--convert-timeout` (default 30s). bwrap stores them advisorily (no cgroup enforcement in this iteration); the OCI engine maps them to `--memory` / `--cpus` / `--pids-limit`.
|
||||
- I/O via bind mount + stdin/stdout. Pandoc reads markdown from stdin, writes to stdout. The viewer template is bind-mounted read-only at `/tpl`. Chromium reads HTML from a read-write bind mount at `/pdf` and writes the PDF to the same mount; the host reads it back. Mount-spec syntax (`host:target[:ro|:rw]`) is identical across engines; the runner translates to `--ro-bind` / `--bind` (bwrap) or `--volume` (podman/docker).
|
||||
- Output cached at `<dir>/.converted/<base>.<ext>` (hidden by the `.` prefix). mtime synced to source so the fast path is a stat-and-serve with no exec. PUT/DELETE/MOVE on the source `.md` purges the sidecars.
|
||||
- Per-project template variables (client/project/contractor/project_number) come from `.zddc` `convert:` cascade keys. Title/tracking_number/revision/status are derived from the filename via `zddc.ParseFilename`.
|
||||
- If neither podman nor docker is present, the endpoint serves 503 with a Retry-After. The rest of the server keeps working.
|
||||
- If no sandbox engine is found on PATH, the endpoint serves 503 with a Retry-After. The rest of the server keeps working.
|
||||
|
||||
## Form-data system (`form/` + zddc-server form handler)
|
||||
|
||||
|
|
|
|||
|
|
@ -403,7 +403,7 @@ Files at the root level are ignored. The grouping folder list and transmittal fo
|
|||
|
||||
**Dependencies:** Toast UI Editor v3.2.2 (vendored at `shared/vendor/toastui-editor-all.min.js`, concatenated into `browse/dist/browse.html` at build time). No runtime CDN, no Tailwind.
|
||||
|
||||
**Server-mode features:** When the file handle is an `HttpFileHandle` (so `node.url` is set and `state.source === 'server'`), three Download buttons appear in the file header — DOCX/HTML/PDF — fetching `?convert=<fmt>` via `window.zddc.source.downloadConverted()`. Clicks auto-save first if the buffer is dirty so converted bytes reflect what's on screen. See `zddc/internal/convert` for the server-side engine.
|
||||
**Server-mode features:** When the file handle is an `HttpFileHandle` (so `node.url` is set and `state.source === 'server'`), three Download buttons appear in the file header — DOCX/HTML/PDF — fetching `?convert=<fmt>` via `window.zddc.source.downloadConverted()`. Clicks auto-save first if the buffer is dirty so converted bytes reflect what's on screen. The server-side engine is in `zddc/internal/convert` — bwrap is the default sandbox (per-call Linux namespaces, no daemon, pandoc/chromium binaries baked into the runtime image), with podman/docker as legacy OCI-image fallbacks for hosts that already have a container engine.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -108,11 +108,16 @@ buildImage:
|
|||
tag: 1.24-alpine
|
||||
# digest: sha256:...
|
||||
|
||||
# Runtime image (main container). Must contain a basic shell + libc;
|
||||
# the static binary is copied in by the init container. Alpine is fine.
|
||||
# Runtime image (main container). Hosts the zddc-server binary copied
|
||||
# in by the init container, plus the conversion toolchain (pandoc,
|
||||
# chromium, bubblewrap) used by the /.convert endpoint. Build from
|
||||
# `zddc/runtime.Containerfile` and publish to your registry; the
|
||||
# Containerfile documents the build/publish commands. Plain alpine
|
||||
# does NOT have the conversion tools — the /.convert endpoint will
|
||||
# serve 503 until you swap in a runtime image that bundles them.
|
||||
runtimeImage:
|
||||
repository: docker.io/alpine
|
||||
tag: "3.19"
|
||||
repository: codeberg.org/varasys/zddc-server-runtime
|
||||
tag: "latest"
|
||||
# digest: sha256:...
|
||||
|
||||
# Image pull credentials, if your registry requires them. Reference a
|
||||
|
|
|
|||
|
|
@ -92,13 +92,18 @@ func main() {
|
|||
// socket is unreachable in sidecar mode), conversion requests
|
||||
// return 503 and everything else keeps working. The probe installs
|
||||
// the package-level Runner when an engine is found; the configured
|
||||
// image refs are pulled lazily on first conversion via
|
||||
// `--pull=missing` so there's no manual setup beyond installing
|
||||
// podman or docker.
|
||||
// Sandbox probe order is bwrap → podman → docker. The
|
||||
// production-default bwrap engine reads the binary names below
|
||||
// (pandoc + chromium are baked into the zddc-server image);
|
||||
// the legacy OCI engines read the image refs and pull them
|
||||
// lazily on first conversion via `--pull=missing`. The probe
|
||||
// installs whichever runner the engine resolves to.
|
||||
//
|
||||
// SetRemoteURL + SetScratchDir must run BEFORE Probe so the probe
|
||||
// can hit the sidecar socket when one is configured.
|
||||
// SetRemoteURL + SetScratchDir must run BEFORE Probe so the
|
||||
// OCI-engine path can hit the sidecar socket when one is
|
||||
// configured; bwrap ignores both.
|
||||
convert.SetImages(cfg.ConvertPandocImage, cfg.ConvertChromiumImage)
|
||||
convert.SetBinaries(cfg.ConvertPandocBinary, cfg.ConvertChromiumBinary)
|
||||
convert.SetRemoteURL(cfg.ConvertPodmanSocket)
|
||||
convert.SetScratchDir(cfg.ConvertScratchDir)
|
||||
probeCtx, probeCancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
|
|
|
|||
|
|
@ -50,12 +50,18 @@ type Config struct {
|
|||
// MD→{docx,html,pdf} conversion endpoint (see internal/convert).
|
||||
// The server shells out to upstream pandoc + chromium container
|
||||
// images via podman or docker, pulling each on first use via
|
||||
// `--pull=missing`. No custom image build is required — only that
|
||||
// podman or docker is on PATH and the configured image refs are
|
||||
// reachable. If no runtime is found the endpoint serves 503.
|
||||
ConvertPandocImage string // --convert-pandoc-image / ZDDC_CONVERT_PANDOC_IMAGE — image for MD→DOCX/HTML. Default docker.io/pandoc/latex:latest.
|
||||
ConvertChromiumImage string // --convert-chromium-image / ZDDC_CONVERT_CHROMIUM_IMAGE — image for HTML→PDF. Default docker.io/zenika/alpine-chrome:latest.
|
||||
ConvertEngine string // --convert-engine / ZDDC_CONVERT_ENGINE — override engine binary (default: probe for podman, then docker).
|
||||
// production default. The engine probe order is bwrap → podman →
|
||||
// docker; the first one found on PATH wins. bwrap runs the
|
||||
// pandoc + chromium binaries baked into the zddc-server image
|
||||
// in a per-call Linux-namespace sandbox (no daemon, no socket,
|
||||
// no OCI image pull). podman/docker are legacy fallbacks for
|
||||
// hosts that already have a container engine and want OCI-image
|
||||
// isolation per conversion.
|
||||
ConvertPandocImage string // --convert-pandoc-image / ZDDC_CONVERT_PANDOC_IMAGE — image for MD→DOCX/HTML when the OCI engine is selected. Default docker.io/pandoc/latex:latest.
|
||||
ConvertChromiumImage string // --convert-chromium-image / ZDDC_CONVERT_CHROMIUM_IMAGE — image for HTML→PDF when the OCI engine is selected. Default docker.io/zenika/alpine-chrome:latest.
|
||||
ConvertPandocBinary string // --convert-pandoc-binary / ZDDC_CONVERT_PANDOC_BINARY — pandoc binary name (PATH-resolved) when the bwrap engine is selected. Default "pandoc".
|
||||
ConvertChromiumBinary string // --convert-chromium-binary / ZDDC_CONVERT_CHROMIUM_BINARY — chromium binary name (PATH-resolved) when the bwrap engine is selected. Default "chromium-browser" (alpine); set to "chromium" on debian.
|
||||
ConvertEngine string // --convert-engine / ZDDC_CONVERT_ENGINE — override sandbox binary (default: probe for bwrap, then podman, then docker).
|
||||
ConvertPodmanSocket string // --convert-podman-socket / ZDDC_CONVERT_PODMAN_SOCKET — when non-empty, run podman in remote mode against this Unix socket (e.g. unix:///var/run/podman/podman.sock). Used with the Kubernetes sidecar pattern so zddc-server's own pod stays unprivileged.
|
||||
ConvertScratchDir string // --convert-scratch-dir / ZDDC_CONVERT_SCRATCH_DIR — directory used for per-conversion scratch (template + HTML/PDF intermediates). Must be a path the remote podman can see at the same path. Empty = use $TMPDIR (local-mode default).
|
||||
ConvertMemMiB int // --convert-mem-mib / ZDDC_CONVERT_MEM_MIB — per-container memory cap in MiB. Default 512.
|
||||
|
|
@ -141,11 +147,15 @@ func Load(args []string) (Config, error) {
|
|||
archiveRescanIntervalFlag := fs.Duration("archive-rescan-interval", parseDurationOrDefault(os.Getenv("ZDDC_ARCHIVE_RESCAN_INTERVAL"), 60*time.Second),
|
||||
"Periodic full re-walk of the archive index. Required on SMB/CIFS-backed roots where inotify misses cross-client writes. Default 60s; set 0 to disable.")
|
||||
convertPandocImageFlag := fs.String("convert-pandoc-image", getEnv("ZDDC_CONVERT_PANDOC_IMAGE", "docker.io/pandoc/latex:latest"),
|
||||
"Pandoc container image for MD→DOCX and MD→HTML. Pulled on first use via --pull=missing.")
|
||||
"Pandoc OCI image for MD→DOCX / MD→HTML, used only when the OCI engine (podman/docker) is selected. Pulled on first use via --pull=missing.")
|
||||
convertChromiumImageFlag := fs.String("convert-chromium-image", getEnv("ZDDC_CONVERT_CHROMIUM_IMAGE", "docker.io/zenika/alpine-chrome:latest"),
|
||||
"Headless Chromium container image for HTML→PDF. Pulled on first use via --pull=missing.")
|
||||
"Chromium OCI image for HTML→PDF, used only when the OCI engine is selected. Pulled on first use via --pull=missing.")
|
||||
convertPandocBinaryFlag := fs.String("convert-pandoc-binary", getEnv("ZDDC_CONVERT_PANDOC_BINARY", "pandoc"),
|
||||
"Pandoc binary name (PATH-resolved) when the bwrap engine is selected. Default \"pandoc\".")
|
||||
convertChromiumBinaryFlag := fs.String("convert-chromium-binary", getEnv("ZDDC_CONVERT_CHROMIUM_BINARY", "chromium-browser"),
|
||||
"Chromium binary name (PATH-resolved) when the bwrap engine is selected. Default \"chromium-browser\" (alpine); set to \"chromium\" on debian/ubuntu.")
|
||||
convertEngineFlag := fs.String("convert-engine", os.Getenv("ZDDC_CONVERT_ENGINE"),
|
||||
"Container engine override (default: probe for podman, then docker).")
|
||||
"Conversion sandbox override (default: probe for bwrap, then podman, then docker).")
|
||||
convertPodmanSocketFlag := fs.String("convert-podman-socket", os.Getenv("ZDDC_CONVERT_PODMAN_SOCKET"),
|
||||
"Run podman in remote mode against this Unix socket URL (e.g. unix:///var/run/podman/podman.sock). When set, the engine binary is invoked as `podman --remote --url=<this> run …`; the actual container creation happens in whatever process owns the socket (typically a podman-system-service sidecar). Empty = local mode.")
|
||||
convertScratchDirFlag := fs.String("convert-scratch-dir", os.Getenv("ZDDC_CONVERT_SCRATCH_DIR"),
|
||||
|
|
@ -231,6 +241,8 @@ func Load(args []string) (Config, error) {
|
|||
ArchiveRescanInterval: *archiveRescanIntervalFlag,
|
||||
ConvertPandocImage: *convertPandocImageFlag,
|
||||
ConvertChromiumImage: *convertChromiumImageFlag,
|
||||
ConvertPandocBinary: *convertPandocBinaryFlag,
|
||||
ConvertChromiumBinary: *convertChromiumBinaryFlag,
|
||||
ConvertEngine: *convertEngineFlag,
|
||||
ConvertPodmanSocket: *convertPodmanSocketFlag,
|
||||
ConvertScratchDir: *convertScratchDirFlag,
|
||||
|
|
|
|||
|
|
@ -1,10 +1,20 @@
|
|||
// Package convert turns a markdown source byte-buffer into DOCX, HTML,
|
||||
// or PDF via two stock upstream container images: pandoc (default
|
||||
// `docker.io/pandoc/latex:latest`) handles MD↔DOCX and MD→HTML, and
|
||||
// a headless-chromium image (default `docker.io/zenika/alpine-chrome:latest`)
|
||||
// handles HTML→PDF. No custom image build is required — the operator
|
||||
// just needs `podman` or `docker` on PATH and the runner pulls each
|
||||
// image on first use via `--pull=missing`.
|
||||
// or PDF. Pandoc handles MD↔DOCX and MD→HTML; headless Chromium handles
|
||||
// HTML→PDF. Each conversion runs inside an isolating sandbox so an
|
||||
// untrusted source-markdown can't reach the host's filesystem or
|
||||
// network even if it drives the binary to RCE.
|
||||
//
|
||||
// Engine probe order (call Probe once at startup, first hit wins):
|
||||
//
|
||||
// 1. bwrap (production default). Runs the pandoc/chromium binaries
|
||||
// baked into the zddc-server runtime image directly under
|
||||
// bubblewrap: namespace-isolated, no network, read-only /usr, a
|
||||
// 256 MiB tmpfs /tmp, minimal proc/dev. Configure binary names
|
||||
// via SetBinaries; defaults are `pandoc` and `chromium-browser`.
|
||||
// 2. podman / docker (legacy fallback). Runs each conversion inside
|
||||
// an OCI container pulled lazily via `--pull=missing`. Defaults
|
||||
// `docker.io/pandoc/latex:latest` + `docker.io/zenika/alpine-
|
||||
// chrome:latest`; configure via SetImages.
|
||||
//
|
||||
// Public surface:
|
||||
//
|
||||
|
|
@ -14,13 +24,14 @@
|
|||
//
|
||||
// Probe(ctx, override) → Capabilities (call once at startup)
|
||||
// Available() → (Capabilities, bool)
|
||||
// SetImages(pandoc, chromium) — install image refs from config
|
||||
// SetImages(pandoc, chromium) — install OCI image refs from config
|
||||
// SetBinaries(pandoc, chromium) — install bwrap binary names from config
|
||||
//
|
||||
// All three converters are safe for concurrent use; each call gets a
|
||||
// fresh container. The pandoc image's entrypoint is `pandoc`, so the
|
||||
// argv we pass after the image flows straight into pandoc. The
|
||||
// alpine-chrome image's entrypoint is `chromium-browser`, so the argv
|
||||
// flows into chromium-browser. No `sh -c` wrappers, no shell quoting.
|
||||
// fresh sandbox. The pandoc binary (or pandoc/latex image's entrypoint)
|
||||
// reads pandoc flags directly; the chromium binary (or alpine-chrome
|
||||
// image's entrypoint) reads chromium-browser flags. No `sh -c`
|
||||
// wrappers, no shell quoting.
|
||||
//
|
||||
// Metadata maps to the placeholders consumed by viewer-template.html.
|
||||
// title/tracking_number/revision/status/is_draft typically come from
|
||||
|
|
@ -55,25 +66,39 @@ type Metadata struct {
|
|||
NoTOC bool
|
||||
}
|
||||
|
||||
// Default images. Operator overrides via --convert-pandoc-image /
|
||||
// --convert-chromium-image (see cmd/zddc-server). pandoc/latex carries
|
||||
// TeX Live for native PDF too, so it's a superset of pandoc/core;
|
||||
// operators wanting a slimmer footprint can switch to pandoc/core.
|
||||
// Default tool refs. The bwrap engine (default since v0.0.x) reads the
|
||||
// Binary fields below; the legacy containerRunner reads the Image
|
||||
// fields. The convert entry points populate both into a ToolSpec so
|
||||
// whichever engine is installed picks the field it needs.
|
||||
//
|
||||
// pandoc/latex carries TeX Live for native PDF too, so the image is a
|
||||
// superset of pandoc/core. The bwrap engine doesn't pay that cost —
|
||||
// each binary is installed from the host's package manager (alpine:
|
||||
// pandoc-cli + chromium) and the image grows by ≈ 200 MB once.
|
||||
const (
|
||||
DefaultPandocImage = "docker.io/pandoc/latex:latest"
|
||||
DefaultChromiumImage = "docker.io/zenika/alpine-chrome:latest"
|
||||
DefaultPandocBinary = "pandoc"
|
||||
// Alpine's chromium package installs the binary as "chromium-browser".
|
||||
// Debian/Ubuntu ships "chromium". Operators override via
|
||||
// --convert-chromium-binary when the package on their image differs.
|
||||
DefaultChromiumBinary = "chromium-browser"
|
||||
)
|
||||
|
||||
var (
|
||||
pandocImage atomic.Pointer[string]
|
||||
chromiumImage atomic.Pointer[string]
|
||||
pandocBinary atomic.Pointer[string]
|
||||
chromiumBinary atomic.Pointer[string]
|
||||
scratchDir atomic.Pointer[string]
|
||||
)
|
||||
|
||||
// SetImages installs the image refs used for subsequent ToDocx/ToHTML/
|
||||
// ToPDF calls. Empty values keep the previous setting (or the
|
||||
// DefaultPandocImage / DefaultChromiumImage constants on first call).
|
||||
// Called from cmd/zddc-server/main.go after flag parsing.
|
||||
// SetImages installs the OCI image refs used by the legacy
|
||||
// containerRunner engine. The bwrap engine ignores these and reads
|
||||
// the binary names installed via SetBinaries instead. Empty values
|
||||
// keep the previous setting (or the DefaultPandocImage /
|
||||
// DefaultChromiumImage constants on first call). Called from
|
||||
// cmd/zddc-server/main.go after flag parsing.
|
||||
func SetImages(pandoc, chromium string) {
|
||||
if pandoc != "" {
|
||||
s := pandoc
|
||||
|
|
@ -85,6 +110,22 @@ func SetImages(pandoc, chromium string) {
|
|||
}
|
||||
}
|
||||
|
||||
// SetBinaries installs the host-binary names used by the bwrap engine.
|
||||
// Empty values keep the previous setting (or the DefaultPandocBinary /
|
||||
// DefaultChromiumBinary constants on first call). The values are
|
||||
// PATH-resolved names (e.g. "pandoc", "chromium-browser") or absolute
|
||||
// paths. Called from cmd/zddc-server/main.go after flag parsing.
|
||||
func SetBinaries(pandoc, chromium string) {
|
||||
if pandoc != "" {
|
||||
s := pandoc
|
||||
pandocBinary.Store(&s)
|
||||
}
|
||||
if chromium != "" {
|
||||
s := chromium
|
||||
chromiumBinary.Store(&s)
|
||||
}
|
||||
}
|
||||
|
||||
// SetScratchDir installs the host-side scratch root used for per-call
|
||||
// intermediates (template, HTML, PDF). Empty means "use $TMPDIR" — the
|
||||
// local-mode default. In remote mode this MUST be a path the podman-
|
||||
|
|
@ -117,6 +158,31 @@ func currentChromiumImage() string {
|
|||
return DefaultChromiumImage
|
||||
}
|
||||
|
||||
func currentPandocBinary() string {
|
||||
if p := pandocBinary.Load(); p != nil && *p != "" {
|
||||
return *p
|
||||
}
|
||||
return DefaultPandocBinary
|
||||
}
|
||||
|
||||
func currentChromiumBinary() string {
|
||||
if p := chromiumBinary.Load(); p != nil && *p != "" {
|
||||
return *p
|
||||
}
|
||||
return DefaultChromiumBinary
|
||||
}
|
||||
|
||||
// pandocTool / chromiumTool build the ToolSpec passed to Runner.Run.
|
||||
// Both fields are populated so whichever engine is installed picks
|
||||
// the one it needs (bwrap reads Binary; containerRunner reads Image).
|
||||
func pandocTool() ToolSpec {
|
||||
return ToolSpec{Image: currentPandocImage(), Binary: currentPandocBinary()}
|
||||
}
|
||||
|
||||
func chromiumTool() ToolSpec {
|
||||
return ToolSpec{Image: currentChromiumImage(), Binary: currentChromiumBinary()}
|
||||
}
|
||||
|
||||
// ToDocx renders source markdown to DOCX bytes. One container run via
|
||||
// the pandoc image. Caller passes the full file content (envelope +
|
||||
// body); pandoc handles `markdown+yaml_metadata_block` natively.
|
||||
|
|
@ -132,7 +198,7 @@ func ToDocx(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
|
|||
}
|
||||
cmd = append(cmd, metadataArgs(m)...)
|
||||
cmd = append(cmd, "-")
|
||||
return r.Run(ctx, currentPandocImage(), source, nil, cmd)
|
||||
return r.Run(ctx, pandocTool(), source, nil, cmd)
|
||||
}
|
||||
|
||||
// ToHTML renders source markdown to standalone HTML using
|
||||
|
|
@ -167,7 +233,7 @@ func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
|
|||
cmd = append(cmd, "--output=-", "-")
|
||||
|
||||
mounts := []string{scratch + ":/tpl:ro"}
|
||||
return r.Run(ctx, currentPandocImage(), source, mounts, cmd)
|
||||
return r.Run(ctx, pandocTool(), source, mounts, cmd)
|
||||
}
|
||||
|
||||
// ToPDF renders source markdown to PDF in two stages: pandoc produces
|
||||
|
|
@ -227,7 +293,7 @@ func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
|
|||
"--print-to-pdf=/pdf/out.pdf",
|
||||
"file:///pdf/in.html",
|
||||
}
|
||||
if _, err := r.Run(ctx, currentChromiumImage(), nil, mounts, cmd); err != nil {
|
||||
if _, err := r.Run(ctx, chromiumTool(), nil, mounts, cmd); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -15,18 +15,18 @@ import (
|
|||
type fakeRunner struct {
|
||||
mu sync.Mutex
|
||||
calls [][]string
|
||||
images []string
|
||||
tools []ToolSpec
|
||||
stdin [][]byte
|
||||
mounts [][]string
|
||||
resp []byte
|
||||
err error
|
||||
}
|
||||
|
||||
func (f *fakeRunner) Run(_ context.Context, image string, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
|
||||
func (f *fakeRunner) Run(_ context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
|
||||
f.mu.Lock()
|
||||
defer f.mu.Unlock()
|
||||
f.calls = append(f.calls, append([]string(nil), cmd...))
|
||||
f.images = append(f.images, image)
|
||||
f.tools = append(f.tools, tool)
|
||||
f.stdin = append(f.stdin, append([]byte(nil), stdin...))
|
||||
f.mounts = append(f.mounts, append([]string(nil), mounts...))
|
||||
return f.resp, f.err
|
||||
|
|
@ -38,7 +38,7 @@ func (f *fakeRunner) lastCall() (string, []string) {
|
|||
if len(f.calls) == 0 {
|
||||
return "", nil
|
||||
}
|
||||
return f.images[len(f.images)-1], f.calls[len(f.calls)-1]
|
||||
return f.tools[len(f.tools)-1].Image, f.calls[len(f.calls)-1]
|
||||
}
|
||||
|
||||
func TestToDocx_UsesPandocImage(t *testing.T) {
|
||||
|
|
@ -137,11 +137,11 @@ type recordedCall struct {
|
|||
mounts []string
|
||||
}
|
||||
|
||||
func (r *recordingRunner) Run(_ context.Context, image string, _ []byte, mounts []string, cmd []string) ([]byte, error) {
|
||||
func (r *recordingRunner) Run(_ context.Context, tool ToolSpec, _ []byte, mounts []string, cmd []string) ([]byte, error) {
|
||||
r.mu.Lock()
|
||||
defer r.mu.Unlock()
|
||||
r.calls = append(r.calls, recordedCall{
|
||||
image: image,
|
||||
image: tool.Image,
|
||||
cmd: append([]string(nil), cmd...),
|
||||
mounts: append([]string(nil), mounts...),
|
||||
})
|
||||
|
|
@ -305,3 +305,113 @@ func contains(haystack []string, needle string) bool {
|
|||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// TestToolSpecPopulation: the convert entry points populate BOTH the
|
||||
// Image and Binary fields of ToolSpec, so the runner-of-the-day can
|
||||
// pick whichever it needs. bwrapRunner reads Binary; containerRunner
|
||||
// reads Image; the call site doesn't know which is installed.
|
||||
func TestToolSpecPopulation(t *testing.T) {
|
||||
f := &fakeRunner{resp: []byte("ok")}
|
||||
InstallRunner(f)
|
||||
t.Cleanup(func() { InstallRunner(nil) })
|
||||
SetImages("docker.io/pandoc/latex:1.0", "docker.io/zenika/alpine-chrome:2.0")
|
||||
SetBinaries("/opt/bin/pandoc", "/opt/bin/chromium")
|
||||
t.Cleanup(func() { SetImages("", ""); SetBinaries("", "") })
|
||||
|
||||
if _, err := ToDocx(context.Background(), []byte("# x\n"), Metadata{}); err != nil {
|
||||
t.Fatalf("ToDocx: %v", err)
|
||||
}
|
||||
if len(f.tools) != 1 {
|
||||
t.Fatalf("want 1 tool call, got %d", len(f.tools))
|
||||
}
|
||||
got := f.tools[0]
|
||||
if got.Image != "docker.io/pandoc/latex:1.0" {
|
||||
t.Errorf("Image = %q, want docker.io/pandoc/latex:1.0", got.Image)
|
||||
}
|
||||
if got.Binary != "/opt/bin/pandoc" {
|
||||
t.Errorf("Binary = %q, want /opt/bin/pandoc", got.Binary)
|
||||
}
|
||||
}
|
||||
|
||||
// TestBwrapArgs_SandboxFlagsPresent locks in the bwrap argv shape.
|
||||
// Every conversion must run with these hardening flags — the whole
|
||||
// point of bwrap-as-default is that the sandbox is built into every
|
||||
// invocation. A refactor that drops any of them needs to fail this
|
||||
// test loudly.
|
||||
func TestBwrapArgs_SandboxFlagsPresent(t *testing.T) {
|
||||
args, err := buildBwrapArgs("pandoc", nil, []string{"--from=markdown", "--to=docx", "-"})
|
||||
if err != nil {
|
||||
t.Fatalf("buildBwrapArgs: %v", err)
|
||||
}
|
||||
mustHave := []string{
|
||||
"--unshare-all", // net + pid + ipc + uts + cgroup
|
||||
"--unshare-user-try", // user-namespace when kernel allows
|
||||
"--die-with-parent", // cleanup when zddc-server exits
|
||||
"--proc", // minimal /proc
|
||||
"--dev", // minimal /dev
|
||||
"--tmpfs", // writable /tmp scratch
|
||||
"--clearenv", // no host env leaks
|
||||
}
|
||||
for _, flag := range mustHave {
|
||||
if !contains(args, flag) {
|
||||
t.Errorf("bwrap args missing sandbox flag %q: %v", flag, args)
|
||||
}
|
||||
}
|
||||
// /usr must be bind-mounted read-only — that's how the binary
|
||||
// + its dynamic libs are visible inside the sandbox. The
|
||||
// "--ro-bind /usr /usr" triple must appear consecutively.
|
||||
if i := indexOfTriple(args, "--ro-bind", "/usr", "/usr"); i < 0 {
|
||||
t.Errorf("bwrap args missing --ro-bind /usr /usr: %v", args)
|
||||
}
|
||||
// Binary + caller-cmd come last, in order.
|
||||
last := args[len(args)-4:]
|
||||
want := []string{"pandoc", "--from=markdown", "--to=docx", "-"}
|
||||
for i, w := range want {
|
||||
if last[i] != w {
|
||||
t.Errorf("trailing args[%d] = %q, want %q", i, last[i], w)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestBwrapArgs_MountTranslation: caller "host:target:ro" → bwrap
|
||||
// "--ro-bind host target"; "host:target:rw" → "--bind host target";
|
||||
// no mode segment defaults to ro (mirroring containerRunner).
|
||||
func TestBwrapArgs_MountTranslation(t *testing.T) {
|
||||
args, err := buildBwrapArgs("pandoc",
|
||||
[]string{"/host/tpl:/tpl:ro", "/host/pdf:/pdf:rw", "/host/x:/x"},
|
||||
nil)
|
||||
if err != nil {
|
||||
t.Fatalf("buildBwrapArgs: %v", err)
|
||||
}
|
||||
if i := indexOfTriple(args, "--ro-bind", "/host/tpl", "/tpl"); i < 0 {
|
||||
t.Errorf("missing --ro-bind /host/tpl /tpl: %v", args)
|
||||
}
|
||||
if i := indexOfTriple(args, "--bind", "/host/pdf", "/pdf"); i < 0 {
|
||||
t.Errorf("missing --bind /host/pdf /pdf: %v", args)
|
||||
}
|
||||
if i := indexOfTriple(args, "--ro-bind", "/host/x", "/x"); i < 0 {
|
||||
t.Errorf("missing default-ro --ro-bind /host/x /x: %v", args)
|
||||
}
|
||||
}
|
||||
|
||||
// TestBwrapArgs_RejectsBadMountSpec: a malformed mount string fails
|
||||
// fast, never reaches exec. Single-segment specs (no target) and
|
||||
// unknown modes both qualify.
|
||||
func TestBwrapArgs_RejectsBadMountSpec(t *testing.T) {
|
||||
for _, bad := range []string{"only-host", "/h:/t:weird", ""} {
|
||||
if _, err := buildBwrapArgs("pandoc", []string{bad}, nil); err == nil {
|
||||
t.Errorf("expected error for malformed mount %q", bad)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// indexOfTriple returns the index of `a` in args such that
|
||||
// args[i:i+3] == {a, b, c}, or -1.
|
||||
func indexOfTriple(args []string, a, b, c string) int {
|
||||
for i := 0; i+2 < len(args); i++ {
|
||||
if args[i] == a && args[i+1] == b && args[i+2] == c {
|
||||
return i
|
||||
}
|
||||
}
|
||||
return -1
|
||||
}
|
||||
|
|
|
|||
|
|
@ -21,24 +21,25 @@ var remoteURL atomic.Pointer[string]
|
|||
// conversion time, so a missing image surfaces as a normal
|
||||
// ConvertError (not a probe failure).
|
||||
//
|
||||
// Mode is "local" when the engine creates containers in the same
|
||||
// process as zddc-server, or "remote" when zddc-server is the client
|
||||
// of a podman-system-service sidecar (see ContainerRunner doc).
|
||||
// Mode applies to OCI engines (podman/docker): "local" when the
|
||||
// engine creates containers in the same process as zddc-server,
|
||||
// "remote" when zddc-server is the client of a podman-system-service
|
||||
// sidecar. The bwrap engine has no mode (always direct exec).
|
||||
type Capabilities struct {
|
||||
Engine string // "podman" | "docker" | ""
|
||||
Engine string // "bwrap" | "podman" | "docker" | ""
|
||||
EngineVer string // first line of "<engine> --version"
|
||||
Mode string // "local" or "remote"
|
||||
RemoteURL string // populated in remote mode
|
||||
PandocImage string // resolved pandoc image ref
|
||||
ChromiumImage string // resolved chromium image ref
|
||||
Mode string // "local" or "remote" (OCI engines only)
|
||||
RemoteURL string // populated in remote mode (OCI engines only)
|
||||
PandocImage string // resolved pandoc image ref (OCI engines)
|
||||
ChromiumImage string // resolved chromium image ref (OCI engines)
|
||||
ProbedAt time.Time
|
||||
Err error
|
||||
}
|
||||
|
||||
// Ready reports whether conversions can be attempted. The first
|
||||
// conversion may still fail if the configured image isn't reachable
|
||||
// from the host's registry (the runner will surface a clear error
|
||||
// from podman/docker stderr).
|
||||
// conversion may still fail if the configured binary or image isn't
|
||||
// actually present (the runner will surface a clear error from the
|
||||
// child process's stderr).
|
||||
func (c Capabilities) Ready() bool {
|
||||
return c.Engine != "" && c.Err == nil
|
||||
}
|
||||
|
|
@ -47,7 +48,7 @@ func (c Capabilities) Ready() bool {
|
|||
// false. Used as the body of a 503.
|
||||
func (c Capabilities) Reason() string {
|
||||
if c.Engine == "" {
|
||||
return "no container runtime (podman or docker) found on PATH"
|
||||
return "no conversion sandbox found (looked for bwrap, podman, docker on PATH)"
|
||||
}
|
||||
if c.Err != nil {
|
||||
if c.Mode == "remote" {
|
||||
|
|
@ -123,33 +124,53 @@ func Probe(ctx context.Context, engineOverride string) Capabilities {
|
|||
c.Mode = "remote"
|
||||
}
|
||||
|
||||
engine := resolveEngine(engineOverride)
|
||||
if engine == "" {
|
||||
c.Err = fmt.Errorf("no container runtime found (tried: %s)", strings.Join(enginesTried(engineOverride), ", "))
|
||||
enginePath := resolveEngine(engineOverride)
|
||||
if enginePath == "" {
|
||||
c.Err = fmt.Errorf("no conversion sandbox found (tried: %s)", strings.Join(enginesTried(engineOverride), ", "))
|
||||
caps.Store(&c)
|
||||
slog.Warn("convert: probe failed", "reason", c.Err.Error())
|
||||
return c
|
||||
}
|
||||
c.Engine = engine
|
||||
kind := engineKind(enginePath)
|
||||
c.Engine = kind
|
||||
|
||||
if v, err := probeVersion(ctx, engine); err == nil {
|
||||
if v, err := probeVersion(ctx, enginePath); err == nil {
|
||||
c.EngineVer = v
|
||||
}
|
||||
|
||||
// bwrap engine: no remote-mode concept, just install the runner.
|
||||
// The bwrap binary IS the sandbox; conversion binaries (pandoc,
|
||||
// chromium) are resolved separately from PATH at call time and
|
||||
// reported by the convert-health endpoint when ready.
|
||||
if kind == "bwrap" {
|
||||
InstallRunner(newBwrapRunner(enginePath))
|
||||
caps.Store(&c)
|
||||
slog.Info("convert: ready",
|
||||
"engine", kind,
|
||||
"engine_path", enginePath,
|
||||
"engine_version", c.EngineVer,
|
||||
"pandoc_binary", currentPandocBinary(),
|
||||
"chromium_binary", currentChromiumBinary())
|
||||
return c
|
||||
}
|
||||
|
||||
// Legacy OCI engine (podman/docker). Optional remote-socket
|
||||
// connectivity check, then install containerRunner.
|
||||
if rURL != "" {
|
||||
if err := probeRemoteSocket(ctx, engine, rURL); err != nil {
|
||||
if err := probeRemoteSocket(ctx, enginePath, rURL); err != nil {
|
||||
c.Err = err
|
||||
caps.Store(&c)
|
||||
slog.Warn("convert: remote socket probe failed",
|
||||
"engine", engine, "remote_url", rURL, "err", err)
|
||||
"engine", kind, "remote_url", rURL, "err", err)
|
||||
return c
|
||||
}
|
||||
}
|
||||
|
||||
InstallRunner(newContainerRunner(engine, rURL))
|
||||
InstallRunner(newContainerRunner(enginePath, rURL))
|
||||
caps.Store(&c)
|
||||
slog.Info("convert: ready",
|
||||
"engine", engine,
|
||||
"engine", kind,
|
||||
"engine_path", enginePath,
|
||||
"engine_version", c.EngineVer,
|
||||
"mode", c.Mode,
|
||||
"remote_url", c.RemoteURL,
|
||||
|
|
@ -193,7 +214,11 @@ func resolveEngine(override string) string {
|
|||
}
|
||||
return ""
|
||||
}
|
||||
for _, name := range []string{"podman", "docker"} {
|
||||
// Probe order: bwrap (production default — lightest sandbox, no
|
||||
// daemon, no OCI engine), then podman / docker as legacy fallbacks
|
||||
// for hosts that already have a container engine and want OCI-image
|
||||
// isolation per conversion.
|
||||
for _, name := range []string{"bwrap", "podman", "docker"} {
|
||||
if p, err := exec.LookPath(name); err == nil {
|
||||
return p
|
||||
}
|
||||
|
|
@ -205,7 +230,27 @@ func enginesTried(override string) []string {
|
|||
if override != "" {
|
||||
return []string{override}
|
||||
}
|
||||
return []string{"podman", "docker"}
|
||||
return []string{"bwrap", "podman", "docker"}
|
||||
}
|
||||
|
||||
// engineKind returns the engine-family label for a resolved binary
|
||||
// path. "bwrap" is its own engine; "podman" and "docker" are the
|
||||
// OCI-container engines handled by containerRunner. Used by Probe to
|
||||
// pick the right Runner implementation.
|
||||
func engineKind(resolved string) string {
|
||||
base := resolved
|
||||
if i := strings.LastIndex(base, "/"); i >= 0 {
|
||||
base = base[i+1:]
|
||||
}
|
||||
switch base {
|
||||
case "bwrap":
|
||||
return "bwrap"
|
||||
case "podman", "podman-remote":
|
||||
return "podman"
|
||||
case "docker":
|
||||
return "docker"
|
||||
}
|
||||
return base
|
||||
}
|
||||
|
||||
func probeVersion(ctx context.Context, engine string) (string, error) {
|
||||
|
|
|
|||
|
|
@ -15,23 +15,44 @@ import (
|
|||
"time"
|
||||
)
|
||||
|
||||
// Runner executes a conversion sub-process and returns its stdout.
|
||||
// The host-side implementation (containerRunner) wraps `podman run`
|
||||
// or `docker run`; tests use a fake.
|
||||
// ToolSpec identifies the conversion tool to invoke. Runners pick
|
||||
// whichever field applies to them:
|
||||
//
|
||||
// image is the OCI image to invoke (e.g. "docker.io/pandoc/latex:latest"
|
||||
// or "docker.io/zenika/alpine-chrome:latest"). stdin is piped to the
|
||||
// container's stdin. cmd is the argv passed *to the image's entrypoint*
|
||||
// — for pandoc/latex the entrypoint is `pandoc`, for alpine-chrome it
|
||||
// is `chromium-browser`. mounts is a list of "<hostPath>:<containerPath>"
|
||||
// specs handed to --volume (":ro" is added if no mode segment is
|
||||
// present).
|
||||
// - bwrapRunner uses Binary — the path or PATH-name of the tool on
|
||||
// the zddc-server host (or container). pandoc/latex's entrypoint
|
||||
// becomes `pandoc`; alpine-chrome's becomes `chromium-browser`.
|
||||
// This is the production-default engine: lightest sandbox, no
|
||||
// daemon, no privileged outer container.
|
||||
//
|
||||
// - containerRunner uses Image — the OCI image ref pulled into a
|
||||
// fresh container for each conversion (legacy/fallback engine,
|
||||
// kept for environments that already host a podman/docker daemon
|
||||
// and want OCI-image isolation per conversion).
|
||||
//
|
||||
// Both fields are populated by the entry points in convert.go so a
|
||||
// single call site works regardless of which engine is installed.
|
||||
type ToolSpec struct {
|
||||
Image string // OCI image ref (containerRunner)
|
||||
Binary string // binary name on PATH (bwrapRunner)
|
||||
}
|
||||
|
||||
// Runner executes a conversion sub-process and returns its stdout.
|
||||
// The host-side implementations are bwrapRunner (default; wraps
|
||||
// `bubblewrap`) and containerRunner (fallback; wraps `podman run` /
|
||||
// `docker run`). Tests use a fake.
|
||||
//
|
||||
// stdin is piped to the tool's stdin. cmd is the argv passed *to the
|
||||
// tool* — for pandoc the entrypoint accepts pandoc flags directly;
|
||||
// for chromium it accepts chromium-browser flags. mounts is a list
|
||||
// of "<hostPath>:<targetPath>" specs (":ro" is added if no mode
|
||||
// segment is present); each runner translates them to its own
|
||||
// bind/--volume syntax.
|
||||
//
|
||||
// All exec calls in this package go through Runner.Run. This is the
|
||||
// first os/exec site in the codebase; the hardening here is the
|
||||
// pattern for future shell-outs.
|
||||
type Runner interface {
|
||||
Run(ctx context.Context, image string, stdin []byte, mounts []string, cmd []string) ([]byte, error)
|
||||
Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error)
|
||||
}
|
||||
|
||||
// ErrUnavailable means no container runtime is present on the host.
|
||||
|
|
@ -196,7 +217,7 @@ func newContainerRunner(engine, remoteURL string) *containerRunner {
|
|||
// --network=none + --no-new-privileges the additional defense from
|
||||
// forcing nobody is small and would break alpine-chrome's own
|
||||
// user-data-dir layout.
|
||||
func (cr *containerRunner) Run(ctx context.Context, image string, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
|
||||
func (cr *containerRunner) Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
|
||||
cr.mu.RLock()
|
||||
engine := cr.engine
|
||||
remoteURL := cr.remoteURL
|
||||
|
|
@ -209,8 +230,9 @@ func (cr *containerRunner) Run(ctx context.Context, image string, stdin []byte,
|
|||
if engine == "" {
|
||||
return nil, ErrUnavailable
|
||||
}
|
||||
image := tool.Image
|
||||
if image == "" {
|
||||
return nil, fmt.Errorf("convert.Run: image is empty")
|
||||
return nil, fmt.Errorf("convert.Run: tool.Image is empty (containerRunner requires an OCI image ref)")
|
||||
}
|
||||
|
||||
runCtx, cancel := context.WithTimeout(ctx, timeout)
|
||||
|
|
@ -313,6 +335,229 @@ func (cr *containerRunner) Run(ctx context.Context, image string, stdin []byte,
|
|||
return stdoutBuf.Bytes(), nil
|
||||
}
|
||||
|
||||
// ───────────────────────────────────────────────────────────────────────────
|
||||
// bwrapRunner — default conversion engine.
|
||||
//
|
||||
// Wraps `bubblewrap` to run pandoc / chromium binaries directly in a
|
||||
// per-call Linux-namespace sandbox. No daemon, no OCI images, no
|
||||
// privileged outer container. Image-build bundles pandoc + chromium
|
||||
// into the zddc-server image so the binaries are available on PATH;
|
||||
// each conversion gets a fresh set of namespaces, a read-only view
|
||||
// of the host's /usr (so the binary + its libs are visible), a tmpfs
|
||||
// /tmp, and nothing else.
|
||||
//
|
||||
// This matches the threat model of the legacy containerRunner —
|
||||
// untrusted source-markdown drives the binary, we contain any
|
||||
// resulting RCE inside the bwrap sandbox — without the operational
|
||||
// tax of running a container engine per conversion (image pull,
|
||||
// daemon, socket, ~300ms startup).
|
||||
//
|
||||
// Hardening (mirror of containerRunner's flags):
|
||||
// - --unshare-all + --share-net=off via omission → no network
|
||||
// - --unshare-user-try → user namespace when kernel allows it
|
||||
// - --die-with-parent → cleanup on zddc-server exit
|
||||
// - --ro-bind /usr /usr, /lib /lib, /lib64 /lib64, /etc /etc, /bin /bin
|
||||
// (where present) → tools + libs visible read-only
|
||||
// - --proc /proc, --dev /dev → minimal pseudo-filesystems
|
||||
// - --tmpfs /tmp (256 MiB) → scratch space, matches container path
|
||||
// - --chdir /tmp → workdir
|
||||
// - --clearenv + minimal HOME/PATH/LANG → no host env leaks
|
||||
// - --cap-drop ALL (bwrap default, explicit for clarity)
|
||||
// ───────────────────────────────────────────────────────────────────────────
|
||||
|
||||
type bwrapRunner struct {
|
||||
mu sync.RWMutex
|
||||
bin string // path to bwrap binary
|
||||
memMiB int // currently advisory; bwrap has no built-in cap
|
||||
cpus string // currently advisory
|
||||
pids int // currently advisory
|
||||
timeout time.Duration // context deadline per Run
|
||||
}
|
||||
|
||||
func newBwrapRunner(bin string) *bwrapRunner {
|
||||
return &bwrapRunner{
|
||||
bin: bin,
|
||||
memMiB: 512,
|
||||
cpus: "2",
|
||||
pids: 100,
|
||||
timeout: 30 * time.Second,
|
||||
}
|
||||
}
|
||||
|
||||
// SetLimits — same shape as containerRunner.SetLimits. bwrap itself
|
||||
// doesn't enforce cgroup limits; we capture the values so an operator
|
||||
// can read them back via /.profile/config or the convert-health probe.
|
||||
// Wrapping with systemd-run --scope --property MemoryMax=… is the
|
||||
// follow-up if hard caps are needed; not in this iteration.
|
||||
func (br *bwrapRunner) SetLimits(memMiB int, cpus string, pids int, timeout time.Duration) {
|
||||
br.mu.Lock()
|
||||
defer br.mu.Unlock()
|
||||
if memMiB > 0 {
|
||||
br.memMiB = memMiB
|
||||
}
|
||||
if cpus != "" {
|
||||
br.cpus = cpus
|
||||
}
|
||||
if pids > 0 {
|
||||
br.pids = pids
|
||||
}
|
||||
if timeout > 0 {
|
||||
br.timeout = timeout
|
||||
}
|
||||
}
|
||||
|
||||
func (br *bwrapRunner) Run(ctx context.Context, tool ToolSpec, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
|
||||
br.mu.RLock()
|
||||
bwrapBin := br.bin
|
||||
timeout := br.timeout
|
||||
br.mu.RUnlock()
|
||||
|
||||
if bwrapBin == "" {
|
||||
return nil, ErrUnavailable
|
||||
}
|
||||
if tool.Binary == "" {
|
||||
return nil, fmt.Errorf("convert.Run: tool.Binary is empty (bwrapRunner requires a host-binary name)")
|
||||
}
|
||||
|
||||
runCtx, cancel := context.WithTimeout(ctx, timeout)
|
||||
defer cancel()
|
||||
|
||||
args, err := buildBwrapArgs(tool.Binary, mounts, cmd)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
c := exec.CommandContext(runCtx, bwrapBin, args...)
|
||||
c.Cancel = func() error {
|
||||
if c.Process == nil {
|
||||
return nil
|
||||
}
|
||||
return c.Process.Kill()
|
||||
}
|
||||
c.WaitDelay = 2 * time.Second
|
||||
c.SysProcAttr = sysProcAttr()
|
||||
c.Env = []string{
|
||||
"PATH=" + os.Getenv("PATH"),
|
||||
"HOME=" + os.TempDir(),
|
||||
}
|
||||
c.Stdin = bytes.NewReader(stdin)
|
||||
|
||||
var stdoutBuf bytes.Buffer
|
||||
c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20}
|
||||
stderr := newRingWriter(4 << 10)
|
||||
c.Stderr = stderr
|
||||
|
||||
if runErr := c.Run(); runErr != nil {
|
||||
exitCode := -1
|
||||
if ee, ok := runErr.(*exec.ExitError); ok {
|
||||
exitCode = ee.ExitCode()
|
||||
}
|
||||
toolName := tool.Binary
|
||||
if runCtx.Err() == context.DeadlineExceeded {
|
||||
return nil, &ConvertError{
|
||||
Tool: toolName,
|
||||
ExitCode: exitCode,
|
||||
Stderr: stderr.String(),
|
||||
Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()),
|
||||
}
|
||||
}
|
||||
return nil, &ConvertError{
|
||||
Tool: toolName,
|
||||
ExitCode: exitCode,
|
||||
Stderr: stderr.String(),
|
||||
Cause: runErr,
|
||||
}
|
||||
}
|
||||
return stdoutBuf.Bytes(), nil
|
||||
}
|
||||
|
||||
// buildBwrapArgs assembles the bwrap argv for a single conversion.
|
||||
// Exposed as a package-internal helper so tests can lock the sandbox
|
||||
// flag shape without exec'ing bwrap. Returns an error when a mount
|
||||
// spec is malformed.
|
||||
func buildBwrapArgs(binary string, mounts, cmd []string) ([]string, error) {
|
||||
args := []string{
|
||||
// Namespace isolation. --unshare-all unshares user (when
|
||||
// available), ipc, pid, net, uts, cgroup; --unshare-user-try
|
||||
// downgrades cleanly when the kernel refuses (e.g. some
|
||||
// container hosts disable user-namespace creation).
|
||||
"--unshare-all",
|
||||
"--unshare-user-try",
|
||||
"--die-with-parent",
|
||||
// Read-only system view. Each --ro-bind only mounts paths
|
||||
// that exist on the host; for hosts where /lib is a symlink
|
||||
// into /usr/lib (modern Linux) the symlink resolution lets
|
||||
// bwrap mount /usr's contents through.
|
||||
"--ro-bind", "/usr", "/usr",
|
||||
"--ro-bind-try", "/lib", "/lib",
|
||||
"--ro-bind-try", "/lib64", "/lib64",
|
||||
"--ro-bind-try", "/bin", "/bin",
|
||||
"--ro-bind-try", "/sbin", "/sbin",
|
||||
"--ro-bind-try", "/etc", "/etc",
|
||||
// Pseudo-filesystems. /proc and /dev are required for any
|
||||
// non-trivial binary; we make them minimal.
|
||||
"--proc", "/proc",
|
||||
"--dev", "/dev",
|
||||
// Scratch. 256 MiB tmpfs at /tmp matches containerRunner.
|
||||
// chromium spills its shared-memory fallback (--disable-dev-
|
||||
// shm-usage) here, so the budget actually matters.
|
||||
"--tmpfs", "/tmp",
|
||||
"--size", "268435456", // 256 MiB; applies to the most recent --tmpfs
|
||||
"--chdir", "/tmp",
|
||||
// Minimal env. HOME=/tmp lets chromium write its
|
||||
// user-data-dir without permission errors; PATH covers the
|
||||
// usual install locations for pandoc + chromium across
|
||||
// alpine / debian / rhel.
|
||||
"--clearenv",
|
||||
"--setenv", "HOME", "/tmp",
|
||||
"--setenv", "PATH", "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
|
||||
"--setenv", "LANG", "C.UTF-8",
|
||||
}
|
||||
// Caller-supplied bind mounts (template, output, …). Same
|
||||
// "host:target[:ro|:rw]" syntax as containerRunner; we translate
|
||||
// to bwrap's --ro-bind / --bind.
|
||||
for _, m := range mounts {
|
||||
host, target, mode, ok := splitMount(m)
|
||||
if !ok {
|
||||
return nil, fmt.Errorf("convert.Run: invalid mount spec %q (want host:target[:ro|:rw])", m)
|
||||
}
|
||||
if mode == "rw" {
|
||||
args = append(args, "--bind", host, target)
|
||||
} else {
|
||||
args = append(args, "--ro-bind", host, target)
|
||||
}
|
||||
}
|
||||
// Finally the binary + its argv. The binary path is PATH-resolved
|
||||
// inside the sandbox via the constructed PATH above; if the
|
||||
// operator passed an absolute path it bypasses PATH lookup and is
|
||||
// invoked verbatim (still subject to the /usr bind mount).
|
||||
args = append(args, binary)
|
||||
args = append(args, cmd...)
|
||||
return args, nil
|
||||
}
|
||||
|
||||
// splitMount parses "host:target[:ro|:rw]" into its three parts.
|
||||
// The mode segment is optional; absent means read-only (matches the
|
||||
// containerRunner default).
|
||||
func splitMount(m string) (host, target, mode string, ok bool) {
|
||||
parts := strings.SplitN(m, ":", 3)
|
||||
if len(parts) < 2 {
|
||||
return "", "", "", false
|
||||
}
|
||||
host = parts[0]
|
||||
target = parts[1]
|
||||
mode = "ro"
|
||||
if len(parts) == 3 {
|
||||
switch parts[2] {
|
||||
case "ro", "rw":
|
||||
mode = parts[2]
|
||||
default:
|
||||
return "", "", "", false
|
||||
}
|
||||
}
|
||||
return host, target, mode, true
|
||||
}
|
||||
|
||||
// imageTag extracts a short name for an image reference, used as the
|
||||
// "Tool" label on ConvertError. "docker.io/pandoc/latex:latest" →
|
||||
// "pandoc/latex".
|
||||
|
|
|
|||
|
|
@ -1511,7 +1511,7 @@ body.is-elevated::after {
|
|||
</svg>
|
||||
<div class="header-title-group">
|
||||
<span class="app-header__title" id="table-title">ZDDC Table</span>
|
||||
<span class="build-timestamp"><span style="color:red;font-weight:bold">v0.0.17-alpha · 2026-05-18 21:36:23 · cff840e-dirty</span></span>
|
||||
<span class="build-timestamp"><span style="color:red;font-weight:bold">v0.0.17-alpha · 2026-05-18 22:38:21 · 85e6eb1-dirty</span></span>
|
||||
</div>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
|
|
|
|||
41
zddc/runtime.Containerfile
Normal file
41
zddc/runtime.Containerfile
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
# Runtime image for zddc-server.
|
||||
#
|
||||
# Bundles the conversion toolchain (pandoc + chromium + bubblewrap) so
|
||||
# the MD→DOCX/HTML/PDF endpoint works without an external container
|
||||
# engine. The convert package's bwrap engine (production default)
|
||||
# sandboxes each pandoc/chromium invocation in a fresh Linux-namespace;
|
||||
# no daemon, no socket, no privileged outer container, no OCI image
|
||||
# pull at conversion time.
|
||||
#
|
||||
# Used by helm charts (helm/zddc-server-prod/) as the main-container
|
||||
# image. The build is independent of zddc-server itself — the binary
|
||||
# is built by the helm chart's init container from a pinned git ref
|
||||
# and copied into this runtime image's filesystem at start. Image
|
||||
# tags should track the upstream package versions (pandoc, chromium)
|
||||
# more than zddc-server, since the binary is layered in at deploy time.
|
||||
#
|
||||
# Build:
|
||||
# podman build -t zddc-server-runtime:latest \
|
||||
# -f zddc/runtime.Containerfile zddc/
|
||||
#
|
||||
# Publish (example):
|
||||
# podman tag zddc-server-runtime:latest \
|
||||
# codeberg.org/varasys/zddc-server-runtime:vYYYYMMDD
|
||||
# podman push codeberg.org/varasys/zddc-server-runtime:vYYYYMMDD
|
||||
#
|
||||
# Size: ≈ 1 GB unpacked (chromium dominates). Container engines
|
||||
# layer + dedupe the chromium libs across replicas on the same node.
|
||||
FROM docker.io/library/alpine:3
|
||||
|
||||
RUN apk add --no-cache \
|
||||
bubblewrap \
|
||||
pandoc-cli \
|
||||
chromium \
|
||||
font-noto \
|
||||
ca-certificates
|
||||
|
||||
# The init container in helm/zddc-server-*/templates/deployment.yaml
|
||||
# writes the compiled zddc-server binary to /zddc/zddc-server in a
|
||||
# shared emptyDir volume; the main container's command is
|
||||
# `/zddc/zddc-server`. No CMD/ENTRYPOINT here because the binary
|
||||
# path is provided by the chart, not baked into the image.
|
||||
Loading…
Reference in a new issue