feat(zddc): MD→{docx,html,pdf} server-side conversion via stock pandoc + chromium containers

New endpoint GET /<path>/foo.md?convert=docx|html|pdf renders a markdown
source on demand. Surfaced as the Download buttons in browse's markdown
editor (separate commit).

Execution model — two upstream container images, lazy-pulled:

  • docker.io/pandoc/latex:latest  — MD→DOCX, MD→HTML (entrypoint pandoc)
  • docker.io/zenika/alpine-chrome — HTML→PDF (entrypoint chromium-browser)

No custom image build. The runner passes --pull=missing on every podman/
docker invocation so the operator only needs the runtime installed —
first request pulls the image, subsequent requests use the local cache.
Overrides: --convert-pandoc-image / --convert-chromium-image (and the
matching ZDDC_CONVERT_* env vars). Engine: --convert-engine (podman
preferred, docker fallback). Resource caps: --convert-mem-mib (512),
--convert-cpus (2), --convert-pids (100), --convert-timeout (30s).

PDF flow is two-stage: pandoc renders the markdown through the embedded
viewer-template.html to standalone HTML, then chromium prints that HTML
via --print-to-pdf. Preserves the print-media CSS already authored in
viewer-template.html rather than going through pandoc's LaTeX template.

Each conversion runs in a throw-away container with --rm --network=none
--read-only --tmpfs=/tmp --cap-drop=ALL --security-opt=no-new-privileges
--env=HOME=/tmp plus a bind-mounted scratch dir for I/O. Pandoc reads
markdown from stdin / writes to stdout; the viewer template lives at
/tpl (ro). Chromium reads HTML from a read-write bind mount at /pdf
and writes the PDF to the same mount; the host reads it back. No shell
wrappers, no shell quoting — argv flows straight into each image's
entrypoint.

On-disk cache at <dir>/.converted/<base>.<ext> with mtime synced to the
source. Fast path is a stat-and-serve with no exec; slow path
singleflights concurrent requests for the same target. PUT/DELETE/MOVE
on the source .md purges the .converted/ sidecars.

Per-project template variables (client/project/contractor/project_number)
come from a new .zddc `convert:` cascade block, walked leaf→root with
per-key latest-wins. Filename-derived variables (title, tracking_number,
revision, status, is_draft) come from a new zddc.ParseFilename helper.

If neither podman nor docker is on PATH, the endpoint serves 503 with
a clear Retry-After. The rest of the server keeps working.

This is the first os/exec site in the codebase. The hardening in
internal/convert/runner.go — context.CancelFunc → process kill,
cmd.WaitDelay, platform-specific SysProcAttr (Setpgid + Pdeathsig on
Linux), minimal env, stdout cap via limitWriter, stderr ring buffer —
sets the pattern for any future shell-outs.

Public surface:
  convert.ToDocx(ctx, source, meta) / .ToHTML / .ToPDF
  convert.Probe(ctx, engineOverride) → install Runner if engine present
  convert.SetImages(pandoc, chromium)
  convert.ConfigureLimits(memMiB, cpus, pids, timeout)
  convert.Available()

Container handler at internal/handler/converthandler.go; dispatcher
branch in cmd/zddc-server/main.go inserts the convert lookup after the
existing ACL gate, reusing the source file's read policy verbatim.
This commit is contained in:
ZDDC 2026-05-13 10:33:56 -05:00
parent b34edcecac
commit b5aab81d31
20 changed files with 3150 additions and 14 deletions

View file

@ -2,6 +2,44 @@
A collection of tools for converting Markdown documents to HTML with a professional viewer interface, optimized for technical documentation and engineering documents.
## Server-side conversion (`zddc-server`)
zddc-server can offer the same conversions on demand: a `.md` file in any
served directory becomes downloadable as `.docx`, `.html`, and `.pdf` via the
`?convert=` query parameter, surfaced as Download buttons in the browse app's
markdown editor.
The server shells out to two upstream container images, pulling each on
first use via `--pull=missing`. No custom image build is required —
operators just install `podman` (preferred) or `docker`, and the first
conversion request pulls the image:
- `docker.io/pandoc/latex:latest` — MD → DOCX and MD → HTML
(override: `--convert-pandoc-image=` or `ZDDC_CONVERT_PANDOC_IMAGE`;
switch to `docker.io/pandoc/core:latest` for a ~90% size reduction
if you don't need pandoc's native LaTeX-PDF path)
- `docker.io/zenika/alpine-chrome:latest` — HTML → PDF
(override: `--convert-chromium-image=` or `ZDDC_CONVERT_CHROMIUM_IMAGE`)
The PDF flow is two-stage: pandoc renders the markdown through
`viewer-template.html` to standalone HTML, then headless Chromium
prints that HTML to PDF. This preserves the existing print-media CSS
authored for the viewer template rather than going through pandoc's
LaTeX template.
If neither podman nor docker is on PATH the endpoint serves 503 with
a clear "no container runtime" message. Engine choice is overridable
via `--convert-engine=` or `ZDDC_CONVERT_ENGINE`.
Resource limits are per-container and configurable: `--convert-mem-mib`
(default 512), `--convert-cpus` (default "2"), `--convert-pids`
(default 100), `--convert-timeout` (default 30s).
Each conversion runs in a throw-away container with
`--rm --network=none --read-only --tmpfs=/tmp --cap-drop=ALL
--security-opt=no-new-privileges` plus a bind-mounted scratch dir
for I/O (read-only for the template; read-write for the PDF output).
## Features
### Document Conversion (`convert`)

View file

@ -19,6 +19,7 @@ import (
"codeberg.org/VARASYS/ZDDC/zddc/internal/auth"
"codeberg.org/VARASYS/ZDDC/zddc/internal/cache"
"codeberg.org/VARASYS/ZDDC/zddc/internal/config"
"codeberg.org/VARASYS/ZDDC/zddc/internal/convert"
appfs "codeberg.org/VARASYS/ZDDC/zddc/internal/fs"
"codeberg.org/VARASYS/ZDDC/zddc/internal/handler"
"codeberg.org/VARASYS/ZDDC/zddc/internal/policy"
@ -86,6 +87,19 @@ func main() {
"addr", cfg.Addr,
"embedded_apps", embeddedVersionsForLog(embedded))
// Probe the container runtime for the MD→{docx,html,pdf} endpoint.
// Non-fatal: if the host has no podman/docker, conversion requests
// return 503 and everything else keeps working. The probe installs
// the package-level Runner when an engine is found; the configured
// image refs are pulled lazily on first conversion via
// `--pull=missing` so there's no manual setup beyond installing
// podman or docker.
convert.SetImages(cfg.ConvertPandocImage, cfg.ConvertChromiumImage)
probeCtx, probeCancel := context.WithTimeout(context.Background(), 5*time.Second)
convert.Probe(probeCtx, cfg.ConvertEngine)
probeCancel()
convert.ConfigureLimits(cfg.ConvertMemMiB, cfg.ConvertCPUs, cfg.ConvertPIDs, cfg.ConvertTimeout)
// Client mode short-circuit: when cfg.Upstream is set, this binary
// runs as a downstream proxy/cache/mirror rather than a master.
// The master-side machinery below (archive index, watcher, apps
@ -472,7 +486,7 @@ func setupAccessAuditLog(path string) *slog.Logger {
// through unchanged when the client doesn't advertise gzip), appends
// Vary: Accept-Encoding automatically, and passes through 304s untouched.
// Yields ~75% size reduction on the larger embedded HTML responses
// (mdedit: 920 KB → ~250 KB on the wire).
// (browse: ~2 MB → a few hundred KB on the wire).
//
// Extracted so tests can construct an equivalent wrapper without going
// through the full main() server boot.
@ -981,9 +995,9 @@ func dispatch(cfg config.Config, idx *archive.Index, ring *handler.LogRing, apps
// File doesn't exist at this path. If the URL matches one of
// the canonical app HTML names AND the request directory is
// one where that app is available (working/staging/incoming
// for classifier, working for mdedit, staging for
// transmittal, anywhere for archive, root only for landing),
// resolve via the apps subsystem.
// for classifier, staging for transmittal, anywhere for
// archive + browse, root only for landing), resolve via the
// apps subsystem.
if appsSrv != nil {
if app, requestDirRel := apps.MatchAppHTML(urlPath); app != "" {
requestDir := filepath.Join(cfg.Root, filepath.FromSlash(requestDirRel))
@ -1002,13 +1016,14 @@ func dispatch(cfg config.Config, idx *archive.Index, ring *handler.LogRing, apps
// a virtual view. The shape rule mirrors the other canonical
// folders (slash → browse, no-slash → default tool):
// - JSON request, any depth → aggregator listing (handler.ServeReviewing)
// - HTML, no slash → mdedit (default tool, via DefaultAppAt)
// - HTML, no slash → browse (default tool, via DefaultAppAt;
// browse hosts the markdown editor plugin)
// - HTML, with slash → browse.html (via ServeDirectory).
// browse fetches JSON which routes back
// through here to ServeReviewing.
// Depth-3 no-slash (reviewing/<tracking>) 302s to the slash form.
// Depth-2 no-slash (reviewing) falls through to the canonical-
// folder block below where DefaultAppAt routes to mdedit.
// folder block below where DefaultAppAt routes to browse.
if r.Method == http.MethodGet || r.Method == http.MethodHead {
if proj, tracking, sidePath, ok := handler.IsReviewingPath(urlPath); ok {
if !strings.HasSuffix(urlPath, "/") {
@ -1098,9 +1113,10 @@ func dispatch(cfg config.Config, idx *archive.Index, ring *handler.LogRing, apps
// directory view (handler.ServeDirectory → DirTool, which
// resolves to browse by default; JSON requests always get the
// raw listing regardless). No trailing slash → the directory's
// default_tool ("specialized app") — mdedit under working/,
// transmittal under staging/, archive under archive/, tables
// under archive/<party>/mdl/ — if one is declared; otherwise
// default_tool ("specialized app") — browse under working/+
// reviewing/ (hosts the markdown editor), transmittal under
// staging/, archive under archive/, tables under
// archive/<party>/mdl/ — if one is declared; otherwise
// (after the project-root landing case below) a 302 to the
// slash form.
if !strings.HasSuffix(urlPath, "/") && (r.Method == http.MethodGet || r.Method == http.MethodHead) && !isRoot {
@ -1138,6 +1154,17 @@ func dispatch(cfg config.Config, idx *archive.Index, ring *handler.LogRing, apps
http.Error(w, "Forbidden", http.StatusForbidden)
return
}
// MD→{docx,html,pdf} on-demand conversion. The endpoint reuses the
// source file's read policy (already gated above), so no separate
// ACL verb. Only .md sources are convertible; everything else falls
// through to the regular file serve.
if fmt := r.URL.Query().Get("convert"); fmt != "" &&
strings.HasSuffix(strings.ToLower(absPath), ".md") {
handler.ServeConverted(cfg, w, r, absPath, fmt, chain)
return
}
handler.ServeFile(w, r, absPath)
}

View file

@ -47,6 +47,20 @@ type Config struct {
MaxWriteBytes int64 // --max-write-bytes / ZDDC_MAX_WRITE_BYTES — upper bound on PUT body size. Default 256 MiB. Per-request limit; rejected with 413.
CascadeMode string // --cascade-mode / ZDDC_CASCADE_MODE — "delegated" (default; leaf grants override ancestor denies) or "strict" (ancestor explicit-denies are absolute, NIST AC-6).
ArchiveRescanInterval time.Duration // --archive-rescan-interval / ZDDC_ARCHIVE_RESCAN_INTERVAL — periodic full re-walk of the archive index. Covers SMB/CIFS where inotify misses cross-client writes. Default 60s; 0 to disable.
// MD→{docx,html,pdf} conversion endpoint (see internal/convert).
// The server shells out to upstream pandoc + chromium container
// images via podman or docker, pulling each on first use via
// `--pull=missing`. No custom image build is required — only that
// podman or docker is on PATH and the configured image refs are
// reachable. If no runtime is found the endpoint serves 503.
ConvertPandocImage string // --convert-pandoc-image / ZDDC_CONVERT_PANDOC_IMAGE — image for MD→DOCX/HTML. Default docker.io/pandoc/latex:latest.
ConvertChromiumImage string // --convert-chromium-image / ZDDC_CONVERT_CHROMIUM_IMAGE — image for HTML→PDF. Default docker.io/zenika/alpine-chrome:latest.
ConvertEngine string // --convert-engine / ZDDC_CONVERT_ENGINE — override engine binary (default: probe for podman, then docker).
ConvertMemMiB int // --convert-mem-mib / ZDDC_CONVERT_MEM_MIB — per-container memory cap in MiB. Default 512.
ConvertCPUs string // --convert-cpus / ZDDC_CONVERT_CPUS — per-container CPU limit. Default "2".
ConvertPIDs int // --convert-pids / ZDDC_CONVERT_PIDS — per-container PID limit. Default 100.
ConvertTimeout time.Duration // --convert-timeout / ZDDC_CONVERT_TIMEOUT — per-conversion wall clock. Default 30s.
}
// ErrHelpRequested is returned by Load when --help is passed; the caller
@ -127,6 +141,20 @@ func Load(args []string) (Config, error) {
"ACL cascade evaluation mode: \"delegated\" (default — subtree allow can override ancestor deny) or \"strict\" (ancestor explicit-deny is absolute; NIST AC-6).")
archiveRescanIntervalFlag := fs.Duration("archive-rescan-interval", parseDurationOrDefault(os.Getenv("ZDDC_ARCHIVE_RESCAN_INTERVAL"), 60*time.Second),
"Periodic full re-walk of the archive index. Required on SMB/CIFS-backed roots where inotify misses cross-client writes. Default 60s; set 0 to disable.")
convertPandocImageFlag := fs.String("convert-pandoc-image", getEnv("ZDDC_CONVERT_PANDOC_IMAGE", "docker.io/pandoc/latex:latest"),
"Pandoc container image for MD→DOCX and MD→HTML. Pulled on first use via --pull=missing.")
convertChromiumImageFlag := fs.String("convert-chromium-image", getEnv("ZDDC_CONVERT_CHROMIUM_IMAGE", "docker.io/zenika/alpine-chrome:latest"),
"Headless Chromium container image for HTML→PDF. Pulled on first use via --pull=missing.")
convertEngineFlag := fs.String("convert-engine", os.Getenv("ZDDC_CONVERT_ENGINE"),
"Container engine override (default: probe for podman, then docker).")
convertMemMiBFlag := fs.Int("convert-mem-mib", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_MEM_MIB"), 512),
"Per-conversion container memory limit in MiB. Default 512.")
convertCPUsFlag := fs.String("convert-cpus", getEnv("ZDDC_CONVERT_CPUS", "2"),
"Per-conversion container CPU limit (passed to --cpus). Default 2.")
convertPIDsFlag := fs.Int("convert-pids", parseIntOrDefault(os.Getenv("ZDDC_CONVERT_PIDS"), 100),
"Per-conversion container PID limit. Default 100.")
convertTimeoutFlag := fs.Duration("convert-timeout", parseDurationOrDefault(os.Getenv("ZDDC_CONVERT_TIMEOUT"), 30*time.Second),
"Per-conversion wall-clock timeout. Default 30s.")
accessLogFlag := fs.String("access-log", os.Getenv("ZDDC_ACCESS_LOG"),
"Tee structured access logs to this file (JSON, size-rotated). "+
"Default: <ZDDC_ROOT>/.zddc.d/logs/access-<hostname>.log. "+
@ -199,6 +227,13 @@ func Load(args []string) (Config, error) {
MaxWriteBytes: *maxWriteBytesFlag,
CascadeMode: *cascadeModeFlag,
ArchiveRescanInterval: *archiveRescanIntervalFlag,
ConvertPandocImage: *convertPandocImageFlag,
ConvertChromiumImage: *convertChromiumImageFlag,
ConvertEngine: *convertEngineFlag,
ConvertMemMiB: *convertMemMiBFlag,
ConvertCPUs: *convertCPUsFlag,
ConvertPIDs: *convertPIDsFlag,
ConvertTimeout: *convertTimeoutFlag,
}
// Default Root to the current working directory.
@ -494,3 +529,14 @@ func parseInt64OrDefault(s string, def int64) int64 {
}
return def
}
func parseIntOrDefault(s string, def int) int {
if s == "" {
return def
}
var n int
if _, err := fmt.Sscan(s, &n); err == nil {
return n
}
return def
}

View file

@ -0,0 +1,253 @@
// Package convert turns a markdown source byte-buffer into DOCX, HTML,
// or PDF via two stock upstream container images: pandoc (default
// `docker.io/pandoc/latex:latest`) handles MD↔DOCX and MD→HTML, and
// a headless-chromium image (default `docker.io/zenika/alpine-chrome:latest`)
// handles HTML→PDF. No custom image build is required — the operator
// just needs `podman` or `docker` on PATH and the runner pulls each
// image on first use via `--pull=missing`.
//
// Public surface:
//
// ToDocx(ctx, source, meta) → []byte (DOCX bytes)
// ToHTML(ctx, source, meta) → []byte (standalone HTML)
// ToPDF (ctx, source, meta) → []byte (PDF, via HTML + chromium)
//
// Probe(ctx, override) → Capabilities (call once at startup)
// Available() → (Capabilities, bool)
// SetImages(pandoc, chromium) — install image refs from config
//
// All three converters are safe for concurrent use; each call gets a
// fresh container. The pandoc image's entrypoint is `pandoc`, so the
// argv we pass after the image flows straight into pandoc. The
// alpine-chrome image's entrypoint is `chromium-browser`, so the argv
// flows into chromium-browser. No `sh -c` wrappers, no shell quoting.
//
// Metadata maps to the placeholders consumed by viewer-template.html.
// title/tracking_number/revision/status/is_draft typically come from
// the source filename (zddc.ParseFilename); client/project/contractor/
// project_number from the .zddc cascade `convert:` block.
package convert
import (
"context"
"fmt"
"os"
"path/filepath"
"strings"
"sync/atomic"
"time"
)
// Metadata is the variable bag passed to pandoc as `--variable k=v`
// pairs. Fields with zero values are omitted. The viewer-template.html
// uses `$if(field)$ … $endif$` blocks so absent fields render cleanly.
type Metadata struct {
Title string
TrackingNumber string
Revision string
Status string
Client string
Project string
Contractor string
ProjectNumber string
GenerationTime time.Time
IsDraft bool
NoTOC bool
}
// Default images. Operator overrides via --convert-pandoc-image /
// --convert-chromium-image (see cmd/zddc-server). pandoc/latex carries
// TeX Live for native PDF too, so it's a superset of pandoc/core;
// operators wanting a slimmer footprint can switch to pandoc/core.
const (
DefaultPandocImage = "docker.io/pandoc/latex:latest"
DefaultChromiumImage = "docker.io/zenika/alpine-chrome:latest"
)
var (
pandocImage atomic.Pointer[string]
chromiumImage atomic.Pointer[string]
)
// SetImages installs the image refs used for subsequent ToDocx/ToHTML/
// ToPDF calls. Empty values keep the previous setting (or the
// DefaultPandocImage / DefaultChromiumImage constants on first call).
// Called from cmd/zddc-server/main.go after flag parsing.
func SetImages(pandoc, chromium string) {
if pandoc != "" {
s := pandoc
pandocImage.Store(&s)
}
if chromium != "" {
s := chromium
chromiumImage.Store(&s)
}
}
func currentPandocImage() string {
if p := pandocImage.Load(); p != nil && *p != "" {
return *p
}
return DefaultPandocImage
}
func currentChromiumImage() string {
if p := chromiumImage.Load(); p != nil && *p != "" {
return *p
}
return DefaultChromiumImage
}
// ToDocx renders source markdown to DOCX bytes. One container run via
// the pandoc image. Caller passes the full file content (envelope +
// body); pandoc handles `markdown+yaml_metadata_block` natively.
func ToDocx(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
r := currentRunner()
if r == nil {
return nil, ErrUnavailable
}
cmd := []string{
"--from=markdown+yaml_metadata_block",
"--to=docx",
"--output=-",
}
cmd = append(cmd, metadataArgs(m)...)
cmd = append(cmd, "-")
return r.Run(ctx, currentPandocImage(), source, nil, cmd)
}
// ToHTML renders source markdown to standalone HTML using
// viewer-template.html. Embeds CSS + images via --embed-resources.
// Template + custom.css are bind-mounted into the container at /tpl
// from a per-call scratch dir.
func ToHTML(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
r := currentRunner()
if r == nil {
return nil, ErrUnavailable
}
scratch, err := writeAssetsToScratch()
if err != nil {
return nil, fmt.Errorf("scratch: %w", err)
}
defer os.RemoveAll(scratch)
cmd := []string{
"--from=markdown+yaml_metadata_block",
"--to=html5",
"--standalone",
"--embed-resources",
"--section-divs",
"--id-prefix=",
"--html-q-tags",
"--template=/tpl/viewer-template.html",
}
if !m.NoTOC {
cmd = append(cmd, "--toc", "--toc-depth=6")
}
cmd = append(cmd, metadataArgs(m)...)
cmd = append(cmd, "--output=-", "-")
mounts := []string{scratch + ":/tpl:ro"}
return r.Run(ctx, currentPandocImage(), source, mounts, cmd)
}
// ToPDF renders source markdown to PDF in two stages: pandoc produces
// HTML using viewer-template.html (stage 1, pandoc image), then headless
// Chromium prints that HTML to PDF (stage 2, chromium image). The
// two-stage choice preserves the print-media CSS already authored in
// viewer-template.html — pandoc's native --pdf-engine path uses LaTeX
// which would bypass it entirely.
//
// Chromium runs from the alpine-chrome image whose entrypoint is
// `chromium-browser`; our cmd is the flag list passed straight to that
// binary. The host scratch dir is bind-mounted read-write at /pdf so
// chromium can write out.pdf and we read it back afterward.
func ToPDF(ctx context.Context, source []byte, m Metadata) ([]byte, error) {
html, err := ToHTML(ctx, source, m)
if err != nil {
return nil, err
}
r := currentRunner()
if r == nil {
return nil, ErrUnavailable
}
scratch, err := os.MkdirTemp("", "zddc-pdf-")
if err != nil {
return nil, fmt.Errorf("scratch: %w", err)
}
defer os.RemoveAll(scratch)
htmlPath := filepath.Join(scratch, "in.html")
pdfPath := filepath.Join(scratch, "out.pdf")
if err := os.WriteFile(htmlPath, html, 0o644); err != nil {
return nil, fmt.Errorf("write html: %w", err)
}
if err := chmodTree(scratch, 0o755, 0o644); err != nil {
return nil, err
}
mounts := []string{scratch + ":/pdf:rw"}
// alpine-chrome's entrypoint is `chromium-browser`. --no-sandbox is
// required because the container drops CAP_SYS_ADMIN; the threat
// model is "malicious markdown drives chromium RCE", contained by
// --network=none + --cap-drop=ALL + --read-only + tmpfs.
cmd := []string{
"--headless",
"--disable-gpu",
"--no-sandbox",
"--user-data-dir=/tmp/chrome",
"--no-pdf-header-footer",
"--virtual-time-budget=10000",
"--print-to-pdf=/pdf/out.pdf",
"file:///pdf/in.html",
}
if _, err := r.Run(ctx, currentChromiumImage(), nil, mounts, cmd); err != nil {
return nil, err
}
out, err := os.ReadFile(pdfPath)
if err != nil {
return nil, fmt.Errorf("read pdf: %w", err)
}
if len(out) < 4 || string(out[:4]) != "%PDF" {
return nil, &ConvertError{
Tool: "chromium",
ExitCode: 0,
Stderr: "chromium did not produce a valid PDF",
Cause: fmt.Errorf("invalid PDF magic in output (got %d bytes)", len(out)),
}
}
return out, nil
}
// metadataArgs renders Metadata into pandoc -V flags. Order is stable
// so test fixtures don't churn. Empty values are omitted (the template
// uses $if(...)$ blocks).
func metadataArgs(m Metadata) []string {
var out []string
add := func(k, v string) {
v = strings.TrimSpace(v)
if v == "" {
return
}
out = append(out, "-V", k+"="+v)
}
add("title", m.Title)
add("tracking_number", m.TrackingNumber)
add("revision", m.Revision)
add("status", m.Status)
add("client", m.Client)
add("project", m.Project)
add("contractor", m.Contractor)
add("project_number", m.ProjectNumber)
if !m.GenerationTime.IsZero() {
add("generation_time", m.GenerationTime.Format("January 02, 2006 at 3:04:05 PM MST"))
}
if m.IsDraft {
add("is_draft", "true")
}
if m.NoTOC {
add("no-toc", "true")
}
return out
}

View file

@ -0,0 +1,286 @@
package convert
import (
"context"
"errors"
"strings"
"sync"
"testing"
"time"
)
// fakeRunner records the args it was invoked with and replays canned
// responses. Lets us assert the command lines + image refs without
// needing podman.
type fakeRunner struct {
mu sync.Mutex
calls [][]string
images []string
stdin [][]byte
mounts [][]string
resp []byte
err error
}
func (f *fakeRunner) Run(_ context.Context, image string, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
f.mu.Lock()
defer f.mu.Unlock()
f.calls = append(f.calls, append([]string(nil), cmd...))
f.images = append(f.images, image)
f.stdin = append(f.stdin, append([]byte(nil), stdin...))
f.mounts = append(f.mounts, append([]string(nil), mounts...))
return f.resp, f.err
}
func (f *fakeRunner) lastCall() (string, []string) {
f.mu.Lock()
defer f.mu.Unlock()
if len(f.calls) == 0 {
return "", nil
}
return f.images[len(f.images)-1], f.calls[len(f.calls)-1]
}
func TestToDocx_UsesPandocImage(t *testing.T) {
f := &fakeRunner{resp: []byte("FAKE-DOCX")}
InstallRunner(f)
t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:latest", "")
out, err := ToDocx(context.Background(), []byte("# Hello\n"), Metadata{
Title: "Hello",
Client: "Acme",
})
if err != nil {
t.Fatalf("ToDocx: %v", err)
}
if string(out) != "FAKE-DOCX" {
t.Errorf("unexpected output: %q", out)
}
image, call := f.lastCall()
if image != "docker.io/pandoc/latex:latest" {
t.Errorf("expected pandoc image, got %q", image)
}
if !contains(call, "--to=docx") {
t.Errorf("missing --to=docx: %v", call)
}
if !contains(call, "title=Hello") {
t.Errorf("missing title metadata: %v", call)
}
if !contains(call, "client=Acme") {
t.Errorf("missing client metadata: %v", call)
}
// Last arg must be "-" so pandoc reads from stdin.
if call[len(call)-1] != "-" {
t.Errorf("expected stdin marker as last arg, got %q", call[len(call)-1])
}
}
func TestToHTML_UsesTemplateAndMountsScratch(t *testing.T) {
f := &fakeRunner{resp: []byte("<html>fake</html>")}
InstallRunner(f)
t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:latest", "")
_, err := ToHTML(context.Background(), []byte("# Hi\n"), Metadata{Title: "Hi"})
if err != nil {
t.Fatalf("ToHTML: %v", err)
}
image, call := f.lastCall()
if image != "docker.io/pandoc/latex:latest" {
t.Errorf("expected pandoc image, got %q", image)
}
if !contains(call, "--template=/tpl/viewer-template.html") {
t.Errorf("template flag missing: %v", call)
}
if !contains(call, "--toc") {
t.Errorf("TOC flag missing (default NoTOC=false): %v", call)
}
if len(f.mounts) == 0 || len(f.mounts[0]) == 0 {
t.Fatalf("expected at least one bind mount for /tpl")
}
mount := f.mounts[0][0]
if !strings.Contains(mount, ":/tpl:") {
t.Errorf("mount missing /tpl: %q", mount)
}
}
func TestToHTML_NoTOCSuppressesTOC(t *testing.T) {
f := &fakeRunner{resp: []byte("<html/>")}
InstallRunner(f)
t.Cleanup(func() { InstallRunner(nil) })
_, _ = ToHTML(context.Background(), []byte("# Hi\n"), Metadata{NoTOC: true})
_, call := f.lastCall()
if contains(call, "--toc") {
t.Errorf("TOC should be suppressed when NoTOC=true: %v", call)
}
if !contains(call, "no-toc=true") {
t.Errorf("no-toc metadata variable missing: %v", call)
}
}
// recordingRunner records every call and returns canned responses
// in sequence. Lets ToPDF tests assert the two-stage pipeline
// (pandoc image then chromium image).
type recordingRunner struct {
mu sync.Mutex
calls []recordedCall
resp [][]byte
err []error
cursor int
}
type recordedCall struct {
image string
cmd []string
mounts []string
}
func (r *recordingRunner) Run(_ context.Context, image string, _ []byte, mounts []string, cmd []string) ([]byte, error) {
r.mu.Lock()
defer r.mu.Unlock()
r.calls = append(r.calls, recordedCall{
image: image,
cmd: append([]string(nil), cmd...),
mounts: append([]string(nil), mounts...),
})
if r.cursor >= len(r.resp) {
return nil, nil
}
out := r.resp[r.cursor]
var e error
if r.cursor < len(r.err) {
e = r.err[r.cursor]
}
r.cursor++
return out, e
}
func TestToPDF_TwoStagePipeline(t *testing.T) {
// Stage 1: pandoc emits HTML. Stage 2: chromium reads HTML from
// the bind mount and writes /pdf/out.pdf. The fake runner can't
// actually write the PDF, so we expect ToPDF to fail at the
// read-back step — but we can still assert the two-stage call
// shape and the right image per stage.
r := &recordingRunner{
resp: [][]byte{
[]byte("<html><body>fake</body></html>"), // stage 1 stdout
nil, // stage 2 stdout (chromium writes PDF to bind mount)
},
}
InstallRunner(r)
t.Cleanup(func() { InstallRunner(nil) })
SetImages("docker.io/pandoc/latex:latest", "docker.io/zenika/alpine-chrome:latest")
_, err := ToPDF(context.Background(), []byte("# Hi\n"), Metadata{})
// PDF read-back will fail (fake runner didn't write the file) —
// that's expected for this test which only inspects the call
// shape.
if err == nil {
t.Fatalf("expected error from PDF read-back; got nil")
}
if len(r.calls) != 2 {
t.Fatalf("expected 2 container calls (pandoc + chromium); got %d", len(r.calls))
}
if r.calls[0].image != "docker.io/pandoc/latex:latest" {
t.Errorf("stage 1 image: got %q want pandoc/latex", r.calls[0].image)
}
if r.calls[1].image != "docker.io/zenika/alpine-chrome:latest" {
t.Errorf("stage 2 image: got %q want alpine-chrome", r.calls[1].image)
}
// Stage 2 must include the --print-to-pdf flag pointing at /pdf.
if !contains(r.calls[1].cmd, "--print-to-pdf=/pdf/out.pdf") {
t.Errorf("chromium call missing --print-to-pdf flag: %v", r.calls[1].cmd)
}
if !contains(r.calls[1].cmd, "--no-sandbox") {
t.Errorf("chromium call missing --no-sandbox: %v", r.calls[1].cmd)
}
// Stage 2's bind mount must be writable (chromium writes the PDF).
if len(r.calls[1].mounts) == 0 || !strings.Contains(r.calls[1].mounts[0], ":rw") {
t.Errorf("chromium mount must be :rw, got %v", r.calls[1].mounts)
}
}
func TestErrUnavailable_WhenNoRunner(t *testing.T) {
InstallRunner(nil)
_, err := ToDocx(context.Background(), []byte("x"), Metadata{})
if !errors.Is(err, ErrUnavailable) {
t.Errorf("expected ErrUnavailable, got %v", err)
}
}
func TestMetadataArgs_OmitsEmptyAndOrdersStably(t *testing.T) {
args := metadataArgs(Metadata{
Title: "T",
Project: "P",
GenerationTime: time.Date(2026, 5, 13, 14, 30, 22, 0, time.UTC),
})
want := []string{
"-V", "title=T",
"-V", "project=P",
}
for i, w := range want {
if i >= len(args) || args[i] != w {
t.Fatalf("args[%d]: got %v want prefix %v", i, args, want)
}
}
joined := strings.Join(args, "|")
if !strings.Contains(joined, "generation_time=") || !strings.Contains(joined, "2026") {
t.Errorf("generation_time missing or malformed: %v", args)
}
if strings.Contains(joined, "client=") {
t.Errorf("empty client should not be passed: %v", args)
}
}
func TestImageTag(t *testing.T) {
cases := map[string]string{
"docker.io/pandoc/latex:latest": "pandoc/latex",
"docker.io/zenika/alpine-chrome:latest": "zenika/alpine-chrome",
"pandoc/core": "pandoc/core",
"quay.io/example/foo:v1": "example/foo",
"alpine": "alpine",
}
for in, want := range cases {
if got := imageTag(in); got != want {
t.Errorf("imageTag(%q) = %q, want %q", in, got, want)
}
}
}
func TestSingleflight_Collapses(t *testing.T) {
var g singleflightGroup
const N = 50
var wg sync.WaitGroup
var hits int32
var mu sync.Mutex
wg.Add(N)
for i := 0; i < N; i++ {
go func() {
defer wg.Done()
_, _ = g.Do("k", func() (any, error) {
mu.Lock()
hits++
mu.Unlock()
time.Sleep(20 * time.Millisecond)
return "v", nil
})
}()
}
wg.Wait()
if hits != 1 {
t.Errorf("singleflight collapse: got %d invocations, want 1", hits)
}
}
// contains reports whether haystack has needle as any of its elements.
func contains(haystack []string, needle string) bool {
for _, s := range haystack {
if s == needle {
return true
}
}
return false
}

View file

@ -0,0 +1,163 @@
/*
* Legal-style heading numbering for ZDDC documents
* Adds hierarchical numbering like 1, 1.1, 1.1.1, etc.
*/
/* Reset counters at document level */
.document-content {
counter-reset: h1-counter;
}
/* H1 counters */
h1 {
counter-reset: h2-counter h3-counter h4-counter h5-counter h6-counter;
counter-increment: h1-counter;
}
h1::before {
content: counter(h1-counter) ". ";
font-weight: bold;
color: var(--primary-color);
}
/* H2 counters */
h2 {
counter-reset: h3-counter h4-counter h5-counter h6-counter;
counter-increment: h2-counter;
}
h2::before {
content: counter(h1-counter) "." counter(h2-counter) " ";
font-weight: bold;
color: var(--primary-color);
}
/* H3 counters */
h3 {
counter-reset: h4-counter h5-counter h6-counter;
counter-increment: h3-counter;
}
h3::before {
content: counter(h1-counter) "." counter(h2-counter) "." counter(h3-counter) " ";
font-weight: bold;
color: var(--primary-color);
}
/* H4 counters */
h4 {
counter-reset: h5-counter h6-counter;
counter-increment: h4-counter;
}
h4::before {
content: counter(h1-counter) "." counter(h2-counter) "." counter(h3-counter) "." counter(h4-counter) " ";
font-weight: bold;
color: var(--primary-color);
}
/* H5 counters */
h5 {
counter-reset: h6-counter;
counter-increment: h5-counter;
}
h5::before {
content: counter(h1-counter) "." counter(h2-counter) "." counter(h3-counter) "." counter(h4-counter) "." counter(h5-counter) " ";
font-weight: bold;
color: var(--primary-color);
}
/* H6 counters */
h6 {
counter-increment: h6-counter;
}
h6::before {
content: counter(h1-counter) "." counter(h2-counter) "." counter(h3-counter) "." counter(h4-counter) "." counter(h5-counter) "." counter(h6-counter) " ";
font-weight: bold;
color: var(--primary-color);
}
/* TOC numbering to match document headings */
.toc {
counter-reset: toc-h1;
}
.toc ul {
list-style: none;
}
.toc > ul > li {
counter-increment: toc-h1;
counter-reset: toc-h2 toc-h3 toc-h4 toc-h5 toc-h6;
}
.toc > ul > li > a::before {
content: counter(toc-h1) ". ";
font-weight: bold;
color: var(--primary-color);
margin-right: 0.25em;
}
.toc > ul > li > ul > li {
counter-increment: toc-h2;
counter-reset: toc-h3 toc-h4 toc-h5 toc-h6;
}
.toc > ul > li > ul > li > a::before {
content: counter(toc-h1) "." counter(toc-h2) " ";
font-weight: bold;
color: var(--primary-color);
margin-right: 0.25em;
}
.toc > ul > li > ul > li > ul > li {
counter-increment: toc-h3;
counter-reset: toc-h4 toc-h5 toc-h6;
}
.toc > ul > li > ul > li > ul > li > a::before {
content: counter(toc-h1) "." counter(toc-h2) "." counter(toc-h3) " ";
font-weight: bold;
color: var(--primary-color);
margin-right: 0.25em;
}
/* Optional: Add some spacing after the numbers */
h1::before, h2::before, h3::before, h4::before, h5::before, h6::before {
margin-right: 0.5em;
}
/* Print-specific adjustments */
@media print {
h1::before, h2::before, h3::before, h4::before, h5::before, h6::before {
color: #000 !important; /* Ensure numbers print in black */
}
}
/* Optional: Style adjustments for better visual hierarchy */
h1 {
border-bottom: 2px solid var(--primary-color);
padding-bottom: 0.3em;
margin-top: 1em;
}
/* Reduce margin for first heading */
h1:first-of-type {
margin-top: 0.5em;
}
h2 {
border-bottom: 1px solid var(--border-color);
padding-bottom: 0.2em;
margin-top: 1.5em;
}
h3 {
margin-top: 1.2em;
}
h4, h5, h6 {
margin-top: 1em;
}

View file

@ -0,0 +1,19 @@
package convert
import _ "embed"
// Pandoc HTML template and its companion stylesheet, copied verbatim from
// /pandoc/viewer-template.html and /pandoc/custom.css. The runner writes
// these to a host scratch dir on each conversion and bind-mounts them
// read-only into the container so pandoc can `--template` against them.
//
// Refresh: when /pandoc/viewer-template.html changes, copy the new bytes
// here. There's no symlink because go:embed paths must resolve under the
// containing module — and we want the binary to ship the bytes verbatim,
// not depend on the source tree at runtime.
//go:embed viewer-template.html
var viewerTemplate []byte
//go:embed custom.css
var customCSS []byte

View file

@ -0,0 +1,152 @@
package convert
import (
"context"
"fmt"
"log/slog"
"os/exec"
"strings"
"sync"
"sync/atomic"
"time"
)
// Capabilities is the snapshot of "can we convert right now?". The
// only hard requirement is a container runtime on PATH — image presence
// is left to `--pull=missing` at conversion time, so a missing image
// surfaces as a normal ConvertError (not a probe failure).
type Capabilities struct {
Engine string // "podman" | "docker" | ""
EngineVer string // first line of "<engine> --version"
PandocImage string // resolved pandoc image ref
ChromiumImage string // resolved chromium image ref
ProbedAt time.Time
Err error
}
// Ready reports whether conversions can be attempted. The first
// conversion may still fail if the configured image isn't reachable
// from the host's registry (the runner will surface a clear error
// from podman/docker stderr).
func (c Capabilities) Ready() bool {
return c.Engine != "" && c.Err == nil
}
// Reason returns a short human-friendly explanation when Ready() is
// false. Used as the body of a 503.
func (c Capabilities) Reason() string {
if c.Engine == "" {
return "no container runtime (podman or docker) found on PATH"
}
if c.Err != nil {
return c.Err.Error()
}
return "unavailable"
}
var (
caps atomic.Pointer[Capabilities]
probeCool sync.Mutex
)
// Available returns the current Capabilities snapshot and whether
// conversions can proceed.
func Available() (Capabilities, bool) {
p := caps.Load()
if p == nil {
return Capabilities{}, false
}
return *p, p.Ready()
}
// Probe locates the container engine and installs a containerRunner
// as the package default. Call once at server startup. Returns the
// captured Capabilities for logging.
//
// Engine order: engineOverride (if non-empty) → podman → docker. First
// hit wins. Image presence is NOT probed: the runner uses
// `--pull=missing` so the first conversion request will pull whichever
// image it needs.
//
// Any failure here is non-fatal: the server still starts, conversion
// endpoints just return 503. This matches the user's locked-in
// requirement that no-container-runtime ⇒ "can't do conversions".
func Probe(ctx context.Context, engineOverride string) Capabilities {
probeCool.Lock()
defer probeCool.Unlock()
now := time.Now()
c := Capabilities{
PandocImage: currentPandocImage(),
ChromiumImage: currentChromiumImage(),
ProbedAt: now,
}
engine := resolveEngine(engineOverride)
if engine == "" {
c.Err = fmt.Errorf("no container runtime found (tried: %s)", strings.Join(enginesTried(engineOverride), ", "))
caps.Store(&c)
slog.Warn("convert: probe failed", "reason", c.Err.Error())
return c
}
c.Engine = engine
if v, err := probeVersion(ctx, engine); err == nil {
c.EngineVer = v
}
InstallRunner(newContainerRunner(engine))
caps.Store(&c)
slog.Info("convert: ready",
"engine", engine,
"engine_version", c.EngineVer,
"pandoc_image", c.PandocImage,
"chromium_image", c.ChromiumImage)
return c
}
// Reprobe re-runs Probe with the existing configuration. Used by the
// handler when a request hits a not-Ready state — gives the operator
// a way to recover (e.g. installed podman after the server started)
// without a server restart. Cooldown of 60 s between probes to keep
// error-path requests cheap.
func Reprobe(ctx context.Context, engineOverride string) Capabilities {
if p := caps.Load(); p != nil {
if time.Since(p.ProbedAt) < 60*time.Second {
return *p
}
}
return Probe(ctx, engineOverride)
}
func resolveEngine(override string) string {
if override != "" {
if p, err := exec.LookPath(override); err == nil {
return p
}
return ""
}
for _, name := range []string{"podman", "docker"} {
if p, err := exec.LookPath(name); err == nil {
return p
}
}
return ""
}
func enginesTried(override string) []string {
if override != "" {
return []string{override}
}
return []string{"podman", "docker"}
}
func probeVersion(ctx context.Context, engine string) (string, error) {
c := exec.CommandContext(ctx, engine, "--version")
out, err := c.CombinedOutput()
if err != nil {
return "", err
}
line := strings.SplitN(strings.TrimSpace(string(out)), "\n", 2)[0]
return line, nil
}

View file

@ -0,0 +1,386 @@
package convert
import (
"bytes"
"context"
"errors"
"fmt"
"io"
"io/fs"
"os"
"os/exec"
"path/filepath"
"strings"
"sync"
"time"
)
// Runner executes a conversion sub-process and returns its stdout.
// The host-side implementation (containerRunner) wraps `podman run`
// or `docker run`; tests use a fake.
//
// image is the OCI image to invoke (e.g. "docker.io/pandoc/latex:latest"
// or "docker.io/zenika/alpine-chrome:latest"). stdin is piped to the
// container's stdin. cmd is the argv passed *to the image's entrypoint*
// — for pandoc/latex the entrypoint is `pandoc`, for alpine-chrome it
// is `chromium-browser`. mounts is a list of "<hostPath>:<containerPath>"
// specs handed to --volume (":ro" is added if no mode segment is
// present).
//
// All exec calls in this package go through Runner.Run. This is the
// first os/exec site in the codebase; the hardening here is the
// pattern for future shell-outs.
type Runner interface {
Run(ctx context.Context, image string, stdin []byte, mounts []string, cmd []string) ([]byte, error)
}
// ErrUnavailable means no container runtime is present on the host.
// Handlers translate to HTTP 503.
var ErrUnavailable = errors.New("conversion unavailable")
// ConvertError carries the failure surface from a non-zero exit.
// Stderr is captured (truncated to 4 KiB by the runner) so callers can
// surface pandoc/chromium's own complaint.
type ConvertError struct {
Tool string // image name fragment, used only for logging
ExitCode int
Stderr string
Cause error
}
func (e *ConvertError) Error() string {
if e == nil {
return "<nil>"
}
if e.Stderr != "" {
return fmt.Sprintf("%s exit %d: %s", e.Tool, e.ExitCode, strings.TrimSpace(e.Stderr))
}
return fmt.Sprintf("%s exit %d: %v", e.Tool, e.ExitCode, e.Cause)
}
func (e *ConvertError) Unwrap() error { return e.Cause }
// containerRunner runs each conversion inside a fresh container.
// The engine ("podman" preferred, "docker" fallback) is resolved once
// at startup by Probe. Resource limits are configurable via
// SetLimits (called from main.go after flag parsing). Images are passed
// per call so the same runner handles both pandoc and chromium
// invocations.
//
// The runner relies on `--pull=missing` so the operator never has to
// pre-pull images: the first request that needs an image pulls it,
// subsequent requests use the local cache. Both podman and docker
// honour this flag identically.
type containerRunner struct {
mu sync.RWMutex
engine string
memMiB int
cpus string
pids int
timeout time.Duration
}
var (
// shared default runner, populated by InstallRunner (called from
// the health probe at startup once the engine is known).
defaultRunnerMu sync.RWMutex
defaultRunner Runner
)
// InstallRunner sets the package-level Runner used by ToDocx/ToHTML/ToPDF.
// Tests inject a fake; production code lets the health probe install a
// containerRunner. Safe to call from multiple goroutines.
func InstallRunner(r Runner) {
defaultRunnerMu.Lock()
defaultRunner = r
defaultRunnerMu.Unlock()
}
// ConfigureLimits applies resource limits to the package-level Runner,
// if it's a containerRunner. No-op when no runner is installed yet
// (the probe failed) or when the installed runner doesn't accept
// limits (e.g. a test fake). Zero values keep the previous setting.
//
// Called from cmd/zddc-server/main.go after Probe so the limits from
// the operator's flags take effect before any conversion request lands.
func ConfigureLimits(memMiB int, cpus string, pids int, timeout time.Duration) {
defaultRunnerMu.RLock()
r := defaultRunner
defaultRunnerMu.RUnlock()
if cr, ok := r.(*containerRunner); ok {
cr.SetLimits(memMiB, cpus, pids, timeout)
}
}
func currentRunner() Runner {
defaultRunnerMu.RLock()
r := defaultRunner
defaultRunnerMu.RUnlock()
return r
}
// SetLimits updates the resource ceilings used for subsequent Run
// invocations. Zero values keep the previous setting (or the defaults
// set at construction). Safe to call from multiple goroutines.
func (cr *containerRunner) SetLimits(memMiB int, cpus string, pids int, timeout time.Duration) {
cr.mu.Lock()
defer cr.mu.Unlock()
if memMiB > 0 {
cr.memMiB = memMiB
}
if cpus != "" {
cr.cpus = cpus
}
if pids > 0 {
cr.pids = pids
}
if timeout > 0 {
cr.timeout = timeout
}
}
func newContainerRunner(engine string) *containerRunner {
return &containerRunner{
engine: engine,
memMiB: 512,
cpus: "2",
pids: 100,
timeout: 30 * time.Second,
}
}
// Run executes one container invocation. cmd is the argv passed to the
// image's entrypoint (pandoc for pandoc/latex, chromium-browser for
// alpine-chrome). mounts is a list of "<hostPath>:<containerPath>"
// strings; ":ro" is appended when no mode segment is present. stdin is
// piped to the container, stdout is returned as bytes (capped at
// 128 MiB).
//
// Hardening:
// - --pull=missing: image is fetched on first use, cached after.
// Operator only needs podman/docker installed; no manual pull.
// - --rm: container is removed on exit, even if killed.
// - --network=none: no network inside the container. Prevents data
// exfiltration through embedded URLs in source documents.
// - --read-only + tmpfs on /tmp and /run: image fs is immutable;
// pandoc/chromium scratch goes to tmpfs only.
// - --memory / --cpus / --pids-limit: kernel-enforced caps.
// - --cap-drop=ALL + --security-opt=no-new-privileges: standard
// container-escape hardening.
// - context-cancel kill + WaitDelay: a wedged podman gets force-
// killed; pipes drop after 2s so we don't leak goroutines.
// - cmd.Env minimal: only PATH + HOME are passed through to the
// engine binary; the container itself sees only what the image
// bakes in plus what --env adds (HOME=/tmp).
//
// Note: --user is intentionally NOT set so each image uses its
// default user (pandoc/latex runs as root, alpine-chrome runs as
// uid 1000). With --read-only + tmpfs + --cap-drop=ALL +
// --network=none + --no-new-privileges the additional defense from
// forcing nobody is small and would break alpine-chrome's own
// user-data-dir layout.
func (cr *containerRunner) Run(ctx context.Context, image string, stdin []byte, mounts []string, cmd []string) ([]byte, error) {
cr.mu.RLock()
engine := cr.engine
memMiB := cr.memMiB
cpus := cr.cpus
pids := cr.pids
timeout := cr.timeout
cr.mu.RUnlock()
if engine == "" {
return nil, ErrUnavailable
}
if image == "" {
return nil, fmt.Errorf("convert.Run: image is empty")
}
runCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
args := []string{
"run",
"--rm",
"--pull=missing",
"-i",
"--network=none",
"--read-only",
"--tmpfs=/tmp:size=128m,exec",
"--tmpfs=/run:size=4m",
fmt.Sprintf("--memory=%dm", memMiB),
fmt.Sprintf("--cpus=%s", cpus),
fmt.Sprintf("--pids-limit=%d", pids),
"--cap-drop=ALL",
"--security-opt=no-new-privileges",
"--env=HOME=/tmp",
"--workdir=/tmp",
}
for _, m := range mounts {
if !strings.Contains(m, ":ro") && !strings.Contains(m, ":rw") {
m += ":ro"
}
args = append(args, "--volume="+m)
}
args = append(args, image)
args = append(args, cmd...)
c := exec.CommandContext(runCtx, engine, args...)
c.Cancel = func() error {
if c.Process == nil {
return nil
}
return c.Process.Kill()
}
c.WaitDelay = 2 * time.Second
c.SysProcAttr = sysProcAttr()
c.Env = []string{
"PATH=" + os.Getenv("PATH"),
"HOME=" + os.TempDir(),
}
c.Stdin = bytes.NewReader(stdin)
var stdoutBuf bytes.Buffer
c.Stdout = &limitWriter{w: &stdoutBuf, max: 128 << 20}
stderr := newRingWriter(4 << 10)
c.Stderr = stderr
err := c.Run()
if err != nil {
exitCode := -1
if ee, ok := err.(*exec.ExitError); ok {
exitCode = ee.ExitCode()
}
toolName := imageTag(image)
if runCtx.Err() == context.DeadlineExceeded {
return nil, &ConvertError{
Tool: toolName,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: fmt.Errorf("timeout after %s: %w", timeout, runCtx.Err()),
}
}
return nil, &ConvertError{
Tool: toolName,
ExitCode: exitCode,
Stderr: stderr.String(),
Cause: err,
}
}
return stdoutBuf.Bytes(), nil
}
// imageTag extracts a short name for an image reference, used as the
// "Tool" label on ConvertError. "docker.io/pandoc/latex:latest" →
// "pandoc/latex".
func imageTag(image string) string {
s := image
// Strip registry prefix.
if i := strings.Index(s, "/"); i >= 0 {
if strings.Contains(s[:i], ".") || strings.Contains(s[:i], ":") {
s = s[i+1:]
}
}
// Strip tag suffix.
if i := strings.LastIndex(s, ":"); i >= 0 {
s = s[:i]
}
return s
}
// limitWriter caps the underlying buffer at max bytes. Writes past the
// cap return io.ErrShortWrite, which surfaces as a Run() error — the
// caller then maps to 422 (output too large) at the handler edge.
type limitWriter struct {
w io.Writer
max int64
n int64
}
func (l *limitWriter) Write(p []byte) (int, error) {
if l.n >= l.max {
return 0, fmt.Errorf("output exceeded %d bytes", l.max)
}
rem := l.max - l.n
if int64(len(p)) > rem {
n, _ := l.w.Write(p[:rem])
l.n += int64(n)
return n, fmt.Errorf("output exceeded %d bytes", l.max)
}
n, err := l.w.Write(p)
l.n += int64(n)
return n, err
}
// ringWriter keeps only the tail of what's written — useful for stderr
// capture where the most-recent bytes are the ones with the actual
// error message and earlier output is usually progress noise.
type ringWriter struct {
mu sync.Mutex
buf []byte
max int
}
func newRingWriter(max int) *ringWriter {
return &ringWriter{max: max}
}
func (r *ringWriter) Write(p []byte) (int, error) {
r.mu.Lock()
defer r.mu.Unlock()
if len(p) >= r.max {
r.buf = append(r.buf[:0], p[len(p)-r.max:]...)
return len(p), nil
}
r.buf = append(r.buf, p...)
if len(r.buf) > r.max {
r.buf = r.buf[len(r.buf)-r.max:]
}
return len(p), nil
}
func (r *ringWriter) String() string {
r.mu.Lock()
defer r.mu.Unlock()
return string(r.buf)
}
// writeAssetsToScratch materialises the embedded viewer-template.html
// and custom.css into a fresh scratch dir under TMPDIR and returns the
// host path. Caller is responsible for os.RemoveAll(dir) when done.
// Used by ToHTML which needs the template visible inside the container.
//
// Files are written world-readable so the container's default user
// (root for pandoc/latex, uid 1000 for alpine-chrome) can read them
// through the read-only bind mount regardless of the host's umask.
func writeAssetsToScratch() (string, error) {
dir, err := os.MkdirTemp("", "zddc-convert-")
if err != nil {
return "", fmt.Errorf("scratch dir: %w", err)
}
if err := os.WriteFile(filepath.Join(dir, "viewer-template.html"), viewerTemplate, 0o644); err != nil {
os.RemoveAll(dir)
return "", fmt.Errorf("write template: %w", err)
}
if err := os.WriteFile(filepath.Join(dir, "custom.css"), customCSS, 0o644); err != nil {
os.RemoveAll(dir)
return "", fmt.Errorf("write css: %w", err)
}
if err := chmodTree(dir, 0o755, 0o644); err != nil {
os.RemoveAll(dir)
return "", err
}
return dir, nil
}
func chmodTree(root string, dirMode, fileMode os.FileMode) error {
return filepath.WalkDir(root, func(p string, d fs.DirEntry, err error) error {
if err != nil {
return err
}
if d.IsDir() {
return os.Chmod(p, dirMode)
}
return os.Chmod(p, fileMode)
})
}

View file

@ -0,0 +1,43 @@
package convert
import "sync"
// singleflightGroup deduplicates concurrent calls keyed by string. If N
// goroutines call Do(key, fn) before the first one returns, fn runs once
// and all callers receive the same (val, err).
//
// Copy of internal/apps/singleflight.go so this package has no internal
// cross-imports. If a third caller appears, lift to internal/sf/.
type singleflightGroup struct {
mu sync.Mutex
m map[string]*sfCall
}
type sfCall struct {
done chan struct{}
val any
err error
}
func (g *singleflightGroup) Do(key string, fn func() (any, error)) (any, error) {
g.mu.Lock()
if g.m == nil {
g.m = make(map[string]*sfCall)
}
if c, ok := g.m[key]; ok {
g.mu.Unlock()
<-c.done
return c.val, c.err
}
c := &sfCall{done: make(chan struct{})}
g.m[key] = c
g.mu.Unlock()
c.val, c.err = fn()
close(c.done)
g.mu.Lock()
delete(g.m, key)
g.mu.Unlock()
return c.val, c.err
}

View file

@ -0,0 +1,20 @@
//go:build linux
package convert
import "syscall"
// sysProcAttr returns the platform-specific SysProcAttr for the
// container-engine child.
//
// - Setpgid: put the child in its own process group so a kill
// targeted at -pid kills grandchildren too (podman/docker spawn
// helper processes for chromium).
// - Pdeathsig: SIGKILL the child if the zddc-server parent exits.
// This is a Linux-only feature (other platforms get only Setpgid).
func sysProcAttr() *syscall.SysProcAttr {
return &syscall.SysProcAttr{
Setpgid: true,
Pdeathsig: syscall.SIGKILL,
}
}

View file

@ -0,0 +1,17 @@
//go:build darwin || freebsd || netbsd || openbsd
package convert
import "syscall"
// sysProcAttr returns the platform-specific SysProcAttr for the
// container-engine child. BSD-family targets get Setpgid only (no
// Pdeathsig); a zddc-server crash mid-conversion may leave the
// detached engine process running on macOS/BSD. In practice
// production deployments are Linux containers where the full
// hardening applies.
func sysProcAttr() *syscall.SysProcAttr {
return &syscall.SysProcAttr{
Setpgid: true,
}
}

View file

@ -0,0 +1,14 @@
//go:build windows
package convert
import "syscall"
// sysProcAttr returns the platform-specific SysProcAttr for the
// container-engine child. Windows: no Setpgid / Pdeathsig analogue;
// process-group semantics differ. We rely on context cancel +
// cmd.Process.Kill() + WaitDelay for cleanup. In practice production
// deployments are Linux containers where the full hardening applies.
func sysProcAttr() *syscall.SysProcAttr {
return &syscall.SysProcAttr{}
}

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,297 @@
package handler
import (
"context"
"errors"
"fmt"
"log/slog"
"net/http"
"os"
"path/filepath"
"strings"
"time"
"codeberg.org/VARASYS/ZDDC/zddc/internal/config"
"codeberg.org/VARASYS/ZDDC/zddc/internal/convert"
"codeberg.org/VARASYS/ZDDC/zddc/internal/zddc"
)
// On-demand MD→{docx,html,pdf} conversion endpoint.
//
// GET /<path>/foo.md?convert=docx|html|pdf
//
// The source file's read policy (already enforced by the dispatcher
// before this handler runs) gates the response. The converted bytes
// are cached at <dir>/.converted/<base>.<ext>, with mtime synced to the
// source — so a fast-path GET that finds a fresh cache hit serves the
// disk file via http.ServeContent without invoking pandoc at all.
//
// When the cache is stale (or absent) the handler:
// 1. Reads source bytes.
// 2. Walks the .zddc cascade to assemble the convert.Metadata.
// 3. Calls convert.ToDocx / ToHTML / ToPDF (containerised pandoc).
// 4. Atomically writes the result to .converted/ and syncs mtime.
// 5. Serves it.
//
// Concurrent requests for the same URL share a single conversion via
// the singleflightGroup keyed by the cached-file absolute path.
var convertSF singleflightGroup
// convertTimeout bounds the slow-path conversion + write + serve. The
// runner itself enforces a finer-grained timeout on the container.
const convertTimeout = 90 * time.Second
// ServeConverted is the entry point. format is the requested target
// extension; chain is the already-resolved ACL chain (re-used here
// only to extract the convert: cascade metadata).
func ServeConverted(cfg config.Config, w http.ResponseWriter, r *http.Request, srcAbs, format string, chain zddc.PolicyChain) {
format = strings.ToLower(strings.TrimSpace(format))
switch format {
case "docx", "html", "pdf":
default:
http.Error(w, "Bad Request — convert must be docx, html, or pdf", http.StatusBadRequest)
return
}
caps, ok := convert.Available()
if !ok {
// One re-probe attempt — gives the operator a way to recover
// after building the image without restarting the server.
caps = convert.Reprobe(r.Context(), os.Getenv("ZDDC_CONVERT_ENGINE"))
if !caps.Ready() {
w.Header().Set("Retry-After", "60")
http.Error(w, "Service Unavailable — "+caps.Reason(), http.StatusServiceUnavailable)
return
}
}
srcInfo, err := os.Stat(srcAbs)
if err != nil {
if errors.Is(err, os.ErrNotExist) {
http.Error(w, "Not Found", http.StatusNotFound)
} else {
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
}
return
}
if srcInfo.IsDir() {
http.Error(w, "Bad Request — convert applies to files", http.StatusBadRequest)
return
}
base := strings.TrimSuffix(filepath.Base(srcAbs), filepath.Ext(srcAbs))
dir := filepath.Dir(srcAbs)
cacheDir := filepath.Join(dir, ".converted")
cacheAbs := filepath.Join(cacheDir, base+"."+format)
// Fast path: cached file present and mtime-equal to source.
if cacheInfo, err := os.Stat(cacheAbs); err == nil && cacheInfo.Mode().IsRegular() {
if cacheInfo.ModTime().Equal(srcInfo.ModTime()) {
serveCached(w, r, cacheAbs, format, base)
return
}
}
// Slow path: convert, cache, serve. Singleflight collapses
// concurrent requests for the same target.
_, err = convertSF.Do(cacheAbs, func() (any, error) {
return nil, buildAndStore(r.Context(), srcAbs, srcInfo, cacheDir, cacheAbs, format, base, chain)
})
if err != nil {
mapConvertError(w, err, format)
return
}
serveCached(w, r, cacheAbs, format, base)
}
// buildAndStore reads the source, runs the conversion, atomically
// writes the result, and syncs the cached mtime to the source mtime.
// Returns the cached file's absolute path on success.
func buildAndStore(ctx context.Context, srcAbs string, srcInfo os.FileInfo, cacheDir, cacheAbs, format, base string, chain zddc.PolicyChain) error {
source, err := os.ReadFile(srcAbs)
if err != nil {
return fmt.Errorf("read source: %w", err)
}
meta := buildMetadata(srcAbs, chain)
ctx, cancel := context.WithTimeout(ctx, convertTimeout)
defer cancel()
var out []byte
switch format {
case "docx":
out, err = convert.ToDocx(ctx, source, meta)
case "html":
out, err = convert.ToHTML(ctx, source, meta)
case "pdf":
out, err = convert.ToPDF(ctx, source, meta)
default:
return fmt.Errorf("unsupported format %q", format)
}
if err != nil {
return err
}
if err := os.MkdirAll(cacheDir, 0o755); err != nil {
return fmt.Errorf("mkdir cache: %w", err)
}
if err := zddc.WriteAtomic(cacheAbs, out); err != nil {
return fmt.Errorf("write cache: %w", err)
}
// Sync mtime to source so the fast-path predicate works on the
// next request. Both atime and mtime get the source's mtime —
// http.ServeContent honors mtime for Last-Modified / ETag.
srcMT := srcInfo.ModTime()
if err := os.Chtimes(cacheAbs, srcMT, srcMT); err != nil {
slog.Warn("convert: chtimes failed (continuing)", "path", cacheAbs, "err", err)
}
return nil
}
// buildMetadata assembles the Metadata used by pandoc -V flags. The
// filename-derived fields (title, tracking_number, revision, status,
// is_draft) come from zddc.ParseFilename; the project-wide fields
// (client/project/contractor/project_number) come from the cascade.
//
// chain.Levels is walked from leaf (last index, most specific) toward
// root, then Embedded as the final fallback. The first non-empty value
// per field wins.
func buildMetadata(srcAbs string, chain zddc.PolicyChain) convert.Metadata {
meta := convert.Metadata{
GenerationTime: time.Now(),
}
name := filepath.Base(srcAbs)
parsed := zddc.ParseFilename(name)
if parsed.Valid {
meta.Title = parsed.Title
meta.TrackingNumber = parsed.TrackingNumber
meta.Revision = parsed.Revision
meta.Status = parsed.Status
meta.IsDraft = strings.Contains(parsed.Revision, "~")
} else {
// Strip extension as a last-resort title.
stem := strings.TrimSuffix(name, filepath.Ext(name))
meta.Title = stem
}
apply := func(zf zddc.ZddcFile) {
if zf.Convert == nil {
return
}
if meta.Client == "" {
meta.Client = zf.Convert.Client
}
if meta.Project == "" {
meta.Project = zf.Convert.Project
}
if meta.Contractor == "" {
meta.Contractor = zf.Convert.Contractor
}
if meta.ProjectNumber == "" {
meta.ProjectNumber = zf.Convert.ProjectNumber
}
}
// Leaf → root.
for i := len(chain.Levels) - 1; i >= 0; i-- {
apply(chain.Levels[i])
}
apply(chain.Embedded)
return meta
}
// serveCached writes the cached file with the correct headers. ETag is
// derived from the source's mtime so a refresh changes it cleanly.
func serveCached(w http.ResponseWriter, r *http.Request, cacheAbs, format, base string) {
f, err := os.Open(cacheAbs)
if err != nil {
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
return
}
defer f.Close()
info, err := f.Stat()
if err != nil {
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", contentTypeFor(format))
w.Header().Set("Content-Disposition", contentDispositionFor(format, base))
w.Header().Set("X-ZDDC-Source", "convert:"+format)
// http.ServeContent handles If-Modified-Since / Range / etc. and
// emits Last-Modified from info.ModTime(). The clients we ship
// don't issue conditional GETs for conversions today, but other
// callers might.
http.ServeContent(w, r, filepath.Base(cacheAbs), info.ModTime(), f)
}
func contentTypeFor(format string) string {
switch format {
case "docx":
return "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
case "html":
return "text/html; charset=utf-8"
case "pdf":
return "application/pdf"
}
return "application/octet-stream"
}
// contentDispositionFor returns the disposition header. HTML is served
// inline (so the browser renders the rich viewer template directly);
// DOCX and PDF are inline too but the browse client adds the anchor's
// `download` attribute, which forces save-as. Filename is the source
// stem + the target extension so the user gets `foo.docx`, not
// `foo.md.docx`.
func contentDispositionFor(format, base string) string {
return fmt.Sprintf(`inline; filename="%s.%s"`, base, format)
}
// purgeConverted removes the cached .converted/<base>.{docx,html,pdf}
// sidecars for an .md source. Called from the file API after a
// successful PUT/DELETE/MOVE so the next GET ?convert= regenerates.
// Best-effort: errors (including "directory doesn't exist") are
// swallowed. Non-.md sources are a no-op so this is safe to call
// unconditionally after any write.
func purgeConverted(srcAbs string) {
if !strings.HasSuffix(strings.ToLower(srcAbs), ".md") {
return
}
dir := filepath.Dir(srcAbs)
base := strings.TrimSuffix(filepath.Base(srcAbs), filepath.Ext(srcAbs))
for _, ext := range []string{".docx", ".html", ".pdf"} {
_ = os.Remove(filepath.Join(dir, ".converted", base+ext))
}
}
func mapConvertError(w http.ResponseWriter, err error, format string) {
if errors.Is(err, convert.ErrUnavailable) {
w.Header().Set("Retry-After", "60")
http.Error(w, "Service Unavailable — conversion runtime not available", http.StatusServiceUnavailable)
return
}
var ce *convert.ConvertError
if errors.As(err, &ce) {
// Timeout → 504. Non-zero exit with stderr → 422 with the
// stderr excerpt so the client can show a real message.
if ce.Cause != nil && errors.Is(ce.Cause, context.DeadlineExceeded) {
http.Error(w, "Gateway Timeout — conversion timed out", http.StatusGatewayTimeout)
return
}
msg := strings.TrimSpace(ce.Stderr)
if msg == "" {
msg = ce.Error()
}
if len(msg) > 1024 {
msg = msg[:1024]
}
http.Error(w, "Unprocessable Entity — "+msg, http.StatusUnprocessableEntity)
return
}
slog.Warn("convert: unexpected error", "format", format, "err", err)
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
}

View file

@ -379,6 +379,9 @@ func serveFilePut(cfg config.Config, w http.ResponseWriter, r *http.Request) {
// Invalidate ETag cache (static.go memoizes by mtime; rename produces
// a fresh mtime so a stale entry is harmless, but clearing is cheap).
etagCacheM.Delete(abs)
// Invalidate any cached MD→{docx,html,pdf} conversions sitting in
// the sibling .converted/ dir for this source.
purgeConverted(abs)
etag := fileETag(body)
w.Header().Set("ETag", `"`+etag+`"`)
@ -433,6 +436,7 @@ func serveFileDelete(cfg config.Config, w http.ResponseWriter, r *http.Request)
return
}
etagCacheM.Delete(abs)
purgeConverted(abs)
w.Header().Set("X-ZDDC-Source", "fileapi:delete")
w.WriteHeader(http.StatusNoContent)
@ -553,6 +557,8 @@ func serveFileMove(cfg config.Config, w http.ResponseWriter, r *http.Request) {
}
etagCacheM.Delete(srcAbs)
etagCacheM.Delete(dstAbs)
purgeConverted(srcAbs)
purgeConverted(dstAbs)
// Compute new ETag from the moved bytes for the response — clients
// that want to keep tracking should pin to this ETag.

View file

@ -0,0 +1,44 @@
package handler
import "sync"
// singleflightGroup deduplicates concurrent calls keyed by string. If N
// goroutines call Do(key, fn) before the first one returns, fn runs once
// and all callers receive the same (val, err).
//
// Copy of internal/apps/singleflight.go (same pattern, no extra
// dependency). The convert package has its own copy too; if a third
// caller appears, lift to internal/sf/.
type singleflightGroup struct {
mu sync.Mutex
m map[string]*sfCall
}
type sfCall struct {
done chan struct{}
val any
err error
}
func (g *singleflightGroup) Do(key string, fn func() (any, error)) (any, error) {
g.mu.Lock()
if g.m == nil {
g.m = make(map[string]*sfCall)
}
if c, ok := g.m[key]; ok {
g.mu.Unlock()
<-c.done
return c.val, c.err
}
c := &sfCall{done: make(chan struct{})}
g.m[key] = c
g.mu.Unlock()
c.val, c.err = fn()
close(c.done)
g.mu.Lock()
delete(g.m, key)
g.mu.Unlock()
return c.val, c.err
}

View file

@ -92,6 +92,22 @@ type Role struct {
Reset bool `yaml:"reset,omitempty" json:"reset,omitempty"`
}
// ConvertMetadata supplies per-project template variables for the
// server-side MD→{docx,html,pdf} conversion endpoint. The handler
// resolves the effective set by walking the .zddc cascade leaf→root
// with per-key latest-wins (an empty deeper value does NOT clear an
// ancestor value — operators write the explicit string they want).
//
// Variables are passed to pandoc as -V key=value flags and consumed by
// pandoc/viewer-template.html's $if(client)$ / $if(project)$ /
// $if(contractor)$ / $if(project_number)$ blocks.
type ConvertMetadata struct {
Client string `yaml:"client,omitempty" json:"client,omitempty"`
Project string `yaml:"project,omitempty" json:"project,omitempty"`
Contractor string `yaml:"contractor,omitempty" json:"contractor,omitempty"`
ProjectNumber string `yaml:"project_number,omitempty" json:"project_number,omitempty"`
}
// ZddcFile represents the parsed contents of a .zddc configuration file.
//
// Admins is honored only in the root .zddc file (<ZDDC_ROOT>/.zddc); subdir
@ -157,6 +173,17 @@ type ZddcFile struct {
// directory whose entries they want renamed.
Display map[string]string `yaml:"display,omitempty" json:"display,omitempty"`
// Convert supplies template variables for the server-side
// MD→{docx,html,pdf} conversion endpoint (see internal/convert).
// Cascades leaf→root with per-key latest-wins. Pointer-to-struct
// so unset is distinguishable from "explicitly empty" — relevant
// because the cascade merger needs to know whether a deeper .zddc
// is contributing a value or just inheriting.
//
// Filename-derived variables (title, tracking_number, revision,
// status) come from zddc.ParseFilename and are NOT in this struct.
Convert *ConvertMetadata `yaml:"convert,omitempty" json:"convert,omitempty"`
// Roles are named principal groups available at this level and below.
// See Role for member syntax.
Roles map[string]Role `yaml:"roles,omitempty" json:"roles,omitempty"`
@ -185,9 +212,9 @@ type ZddcFile struct {
Inherit *bool `yaml:"inherit,omitempty" json:"inherit,omitempty"`
// DefaultTool is the tool name served at this directory's
// no-slash URL form (e.g. /Project/working without trailing slash
// → mdedit). Empty means "no default" — the no-slash form 302s to
// the slash form, which serves DirTool (browse by default).
// no-slash URL form (e.g. /Project/staging without trailing slash
// → transmittal). Empty means "no default" — the no-slash form
// 302s to the slash form, which serves DirTool (browse by default).
// Cascades through Paths: an ancestor's Paths entry can set
// DefaultTool for a virtual descendant without anyone creating
// that dir. This is the "specialized app" half of the slash/no-
@ -271,8 +298,10 @@ type ZddcFile struct {
// Empty list at every level means "no tools available" (effectively
// blocks all auto-serving); the embedded defaults seed the
// universal baseline of archive/browse/landing at root. Operators
// can add tools at deeper levels (working/ adds mdedit + classifier,
// staging/ adds transmittal + classifier, etc.).
// can add tools at deeper levels (working/ adds classifier,
// staging/ adds transmittal + classifier, etc.). browse hosts the
// markdown editor as a plugin so no extra tool is needed under
// working/ or reviewing/.
//
// This does NOT gate explicit static files: an on-disk
// <dir>/transmittal.html is always served. It gates only the

View file

@ -46,3 +46,50 @@ func IsTrnOrSubTracking(tracking string) bool {
upper := strings.ToUpper(tracking)
return strings.Contains(upper, "-TRN-") || strings.Contains(upper, "-SUB-")
}
// documentFilenameRE matches the canonical ZDDC document-filename shape:
//
// <tracking>_<revision> (<status>) - <title>.<ext>
//
// where <tracking> has no spaces or underscores in the tracking part,
// <revision> is anything without a space, <status> is anything inside
// parentheses, and <title> is anything after the dash up to the
// last "." before the extension.
//
// Mirror of the JS parser in shared/zddc.js — kept here for the
// conversion handler which needs to feed title/tracking/revision/
// status to pandoc as template variables.
var documentFilenameRE = regexp.MustCompile(
`^([^_]+)_(\S+)\s*\(([^)]+)\)\s*-\s*(.+)\.([^.]+)$`,
)
// ParsedFilename is the result of ParseFilename: tracking number,
// revision, status, title (everything before the extension), and the
// lowercased extension. Valid is true iff the filename matched the
// canonical pattern.
type ParsedFilename struct {
TrackingNumber string
Revision string
Status string
Title string
Extension string
Valid bool
}
// ParseFilename splits a document filename into its ZDDC components.
// Returns Valid=false if the filename doesn't match the canonical shape;
// callers can fall back to stem-based metadata in that case.
func ParseFilename(name string) ParsedFilename {
m := documentFilenameRE.FindStringSubmatch(name)
if m == nil {
return ParsedFilename{}
}
return ParsedFilename{
TrackingNumber: m[1],
Revision: m[2],
Status: m[3],
Title: m[4],
Extension: strings.ToLower(m[5]),
Valid: true,
}
}

View file

@ -105,6 +105,29 @@ func mergeOverlay(base, top ZddcFile) ZddcFile {
out.Tables = mergeStringMap(out.Tables, top.Tables)
out.Display = mergeStringMap(out.Display, top.Display)
// Convert: per-key latest-wins. Pointer-to-struct so we can tell
// "absent" from "explicitly empty" — the latter is rare but valid
// (an operator who wants to suppress a deployment-default value).
// Empty top values do NOT clear the ancestor value; operators must
// set an explicit non-empty string to override.
if top.Convert != nil {
if out.Convert == nil {
out.Convert = &ConvertMetadata{}
}
if top.Convert.Client != "" {
out.Convert.Client = top.Convert.Client
}
if top.Convert.Project != "" {
out.Convert.Project = top.Convert.Project
}
if top.Convert.Contractor != "" {
out.Convert.Contractor = top.Convert.Contractor
}
if top.Convert.ProjectNumber != "" {
out.Convert.ProjectNumber = top.Convert.ProjectNumber
}
}
// Roles: per-name merge (top wins on name clash). This combines
// the on-disk .zddc at a level with any virtual contributions
// from ancestor paths: at the same level. Cross-LEVEL role