ZDDC/zddc/internal/convert/health.go
ZDDC b5aab81d31 feat(zddc): MD→{docx,html,pdf} server-side conversion via stock pandoc + chromium containers
New endpoint GET /<path>/foo.md?convert=docx|html|pdf renders a markdown
source on demand. Surfaced as the Download buttons in browse's markdown
editor (separate commit).

Execution model — two upstream container images, lazy-pulled:

  • docker.io/pandoc/latex:latest  — MD→DOCX, MD→HTML (entrypoint pandoc)
  • docker.io/zenika/alpine-chrome — HTML→PDF (entrypoint chromium-browser)

No custom image build. The runner passes --pull=missing on every podman/
docker invocation so the operator only needs the runtime installed —
first request pulls the image, subsequent requests use the local cache.
Overrides: --convert-pandoc-image / --convert-chromium-image (and the
matching ZDDC_CONVERT_* env vars). Engine: --convert-engine (podman
preferred, docker fallback). Resource caps: --convert-mem-mib (512),
--convert-cpus (2), --convert-pids (100), --convert-timeout (30s).

PDF flow is two-stage: pandoc renders the markdown through the embedded
viewer-template.html to standalone HTML, then chromium prints that HTML
via --print-to-pdf. Preserves the print-media CSS already authored in
viewer-template.html rather than going through pandoc's LaTeX template.

Each conversion runs in a throw-away container with --rm --network=none
--read-only --tmpfs=/tmp --cap-drop=ALL --security-opt=no-new-privileges
--env=HOME=/tmp plus a bind-mounted scratch dir for I/O. Pandoc reads
markdown from stdin / writes to stdout; the viewer template lives at
/tpl (ro). Chromium reads HTML from a read-write bind mount at /pdf
and writes the PDF to the same mount; the host reads it back. No shell
wrappers, no shell quoting — argv flows straight into each image's
entrypoint.

On-disk cache at <dir>/.converted/<base>.<ext> with mtime synced to the
source. Fast path is a stat-and-serve with no exec; slow path
singleflights concurrent requests for the same target. PUT/DELETE/MOVE
on the source .md purges the .converted/ sidecars.

Per-project template variables (client/project/contractor/project_number)
come from a new .zddc `convert:` cascade block, walked leaf→root with
per-key latest-wins. Filename-derived variables (title, tracking_number,
revision, status, is_draft) come from a new zddc.ParseFilename helper.

If neither podman nor docker is on PATH, the endpoint serves 503 with
a clear Retry-After. The rest of the server keeps working.

This is the first os/exec site in the codebase. The hardening in
internal/convert/runner.go — context.CancelFunc → process kill,
cmd.WaitDelay, platform-specific SysProcAttr (Setpgid + Pdeathsig on
Linux), minimal env, stdout cap via limitWriter, stderr ring buffer —
sets the pattern for any future shell-outs.

Public surface:
  convert.ToDocx(ctx, source, meta) / .ToHTML / .ToPDF
  convert.Probe(ctx, engineOverride) → install Runner if engine present
  convert.SetImages(pandoc, chromium)
  convert.ConfigureLimits(memMiB, cpus, pids, timeout)
  convert.Available()

Container handler at internal/handler/converthandler.go; dispatcher
branch in cmd/zddc-server/main.go inserts the convert lookup after the
existing ACL gate, reusing the source file's read policy verbatim.
2026-05-13 10:33:56 -05:00

152 lines
4.2 KiB
Go

package convert
import (
"context"
"fmt"
"log/slog"
"os/exec"
"strings"
"sync"
"sync/atomic"
"time"
)
// Capabilities is the snapshot of "can we convert right now?". The
// only hard requirement is a container runtime on PATH — image presence
// is left to `--pull=missing` at conversion time, so a missing image
// surfaces as a normal ConvertError (not a probe failure).
type Capabilities struct {
Engine string // "podman" | "docker" | ""
EngineVer string // first line of "<engine> --version"
PandocImage string // resolved pandoc image ref
ChromiumImage string // resolved chromium image ref
ProbedAt time.Time
Err error
}
// Ready reports whether conversions can be attempted. The first
// conversion may still fail if the configured image isn't reachable
// from the host's registry (the runner will surface a clear error
// from podman/docker stderr).
func (c Capabilities) Ready() bool {
return c.Engine != "" && c.Err == nil
}
// Reason returns a short human-friendly explanation when Ready() is
// false. Used as the body of a 503.
func (c Capabilities) Reason() string {
if c.Engine == "" {
return "no container runtime (podman or docker) found on PATH"
}
if c.Err != nil {
return c.Err.Error()
}
return "unavailable"
}
var (
caps atomic.Pointer[Capabilities]
probeCool sync.Mutex
)
// Available returns the current Capabilities snapshot and whether
// conversions can proceed.
func Available() (Capabilities, bool) {
p := caps.Load()
if p == nil {
return Capabilities{}, false
}
return *p, p.Ready()
}
// Probe locates the container engine and installs a containerRunner
// as the package default. Call once at server startup. Returns the
// captured Capabilities for logging.
//
// Engine order: engineOverride (if non-empty) → podman → docker. First
// hit wins. Image presence is NOT probed: the runner uses
// `--pull=missing` so the first conversion request will pull whichever
// image it needs.
//
// Any failure here is non-fatal: the server still starts, conversion
// endpoints just return 503. This matches the user's locked-in
// requirement that no-container-runtime ⇒ "can't do conversions".
func Probe(ctx context.Context, engineOverride string) Capabilities {
probeCool.Lock()
defer probeCool.Unlock()
now := time.Now()
c := Capabilities{
PandocImage: currentPandocImage(),
ChromiumImage: currentChromiumImage(),
ProbedAt: now,
}
engine := resolveEngine(engineOverride)
if engine == "" {
c.Err = fmt.Errorf("no container runtime found (tried: %s)", strings.Join(enginesTried(engineOverride), ", "))
caps.Store(&c)
slog.Warn("convert: probe failed", "reason", c.Err.Error())
return c
}
c.Engine = engine
if v, err := probeVersion(ctx, engine); err == nil {
c.EngineVer = v
}
InstallRunner(newContainerRunner(engine))
caps.Store(&c)
slog.Info("convert: ready",
"engine", engine,
"engine_version", c.EngineVer,
"pandoc_image", c.PandocImage,
"chromium_image", c.ChromiumImage)
return c
}
// Reprobe re-runs Probe with the existing configuration. Used by the
// handler when a request hits a not-Ready state — gives the operator
// a way to recover (e.g. installed podman after the server started)
// without a server restart. Cooldown of 60 s between probes to keep
// error-path requests cheap.
func Reprobe(ctx context.Context, engineOverride string) Capabilities {
if p := caps.Load(); p != nil {
if time.Since(p.ProbedAt) < 60*time.Second {
return *p
}
}
return Probe(ctx, engineOverride)
}
func resolveEngine(override string) string {
if override != "" {
if p, err := exec.LookPath(override); err == nil {
return p
}
return ""
}
for _, name := range []string{"podman", "docker"} {
if p, err := exec.LookPath(name); err == nil {
return p
}
}
return ""
}
func enginesTried(override string) []string {
if override != "" {
return []string{override}
}
return []string{"podman", "docker"}
}
func probeVersion(ctx context.Context, engine string) (string, error) {
c := exec.CommandContext(ctx, engine, "--version")
out, err := c.CombinedOutput()
if err != nil {
return "", err
}
line := strings.SplitN(strings.TrimSpace(string(out)), "\n", 2)[0]
return line, nil
}