ZDDC/pandoc/README.md
ZDDC c765fe9183 feat(pandoc): named doctype templates + front-matter numbering toggle
Replace the single always-numbered viewer-template.html with a templates/
directory of named doctype templates that share partials:

- templates/_head.html  — <head> + all CSS (numbering CSS now scoped behind a
  body.numbered class instead of being applied unconditionally)
- templates/_doc.html   — shared TOC-sidebar body (report/specification)
- templates/_scripts.html — shared JS
- templates/{report,specification}.html — TOC-layout doctypes
- templates/letter.html — single-column letterhead, no TOC

A document selects its template with `template: <name>` in YAML front matter
(default report) and turns on legal numbering with `numbering: true` (default
off). Pandoc passes both fields straight from the front matter — the numbering
toggle needs no converter code. Retire custom.css (folded into _head.html,
gated) and the old viewer-template.html.

CLI: convert md→html resolves templates/<name>.html (name from front matter,
sanitized, default report); convert-diff uses templates/report.html and no
longer passes --css=custom.css. README updated.

Server (zddc/internal/convert) still uses its own embedded copy and is
unchanged here; it migrates to this templates/ dir in the next commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 14:07:36 -05:00

167 lines
7.6 KiB
Markdown

# ZDDC Pandoc Tools
A collection of tools for converting Markdown documents to HTML with a professional viewer interface, optimized for technical documentation and engineering documents.
## Server-side conversion (`zddc-server`)
> The shell scripts in this folder are standalone CLI/batch tools. `zddc-server`
> implements its **own** on-demand conversion (Go package `zddc/internal/convert`)
> and does **not** call these scripts. It does, however, reuse the same
> `templates/` (embedded at build time). See AGENTS.md → "Server-side document
> conversion" for the authoritative reference.
zddc-server can render any served `.md` on demand: requesting the sibling URL
`<path>/foo.docx` (or `.html` / `.pdf`) returns the converted bytes — no query
string. A real on-disk file of that name always wins; the virtual conversion
only fires when the requested file doesn't exist but `foo.md` does. The browse
app's markdown editor surfaces these as DOCX/HTML/PDF download links (auto-saving
a dirty buffer first so the output matches what's on screen).
**Architecture.** The Go code does the minimum — it `exec`s `pandoc` and
`chromium-browser` directly. The sandbox and resource caps live in the runtime
**image**, where `/usr/local/bin/{pandoc,chromium-browser}` are wrapper scripts
that run the real binary inside a per-conversion bubblewrap sandbox
(`--unshare-all`, read-only binds, `--tmpfs /tmp`, `--clearenv`) under cgroup v2
memory/PID caps. I/O is via stdin/stdout plus a per-call scratch dir. There is no
container runtime and no image pulling at request time.
The PDF flow is two-stage: pandoc renders the markdown through the selected
`templates/<doctype>.html` to standalone HTML, then headless Chromium prints that
HTML to PDF — preserving the template's print-media CSS rather than going through
pandoc's LaTeX template.
Converted bytes are cached at `<dir>/.zddc.d/converted/<base>.<ext>` with mtime
synced to the source, so a fresh cache hit is a stat-and-serve with no `exec`.
A PUT/DELETE/MOVE on the source `.md` purges the sidecars. Per-project header
metadata (client/project/contractor/project_number) comes from the `.zddc`
`convert:` cascade; title/tracking_number/revision/status are derived from the
filename via `zddc.ParseFilename`.
Relevant flags (defaults in parens):
- `--convert-pandoc-binary` (`pandoc`) / `--convert-chromium-binary`
(`chromium-browser`; `chromium` on Debian) — PATH-resolved name or absolute path
- `--convert-scratch-dir` (`$TMPDIR`) — host scratch root for template + intermediates
- `--convert-mem-mib` (`1024`) — per-conversion memory cap (cgroup `memory.max`)
- `--convert-pids` (`256`) — per-conversion PID cap (cgroup `pids.max`)
- `--convert-timeout` (`60s`) — per-conversion wall clock (Go `context.WithTimeout`)
If `pandoc`/`chromium` aren't on PATH (e.g. running zddc-server outside the runtime
image) the endpoint serves 503 with a `Retry-After`; the rest of the server keeps
working. Running against raw pandoc/chromium with no wrapper gives a working but
**unsandboxed** endpoint — fine for dev iteration.
## Features
### Document Conversion (`convert`)
- **Batch processing**: Convert multiple Markdown files at once
- **Force overwrite**: `-f` flag to overwrite existing output files
- **Custom output directory**: `-o` flag to specify output location
- **Configuration-driven**: Uses `zddc.conf` for project-specific settings
- **Template integration**: Automatically applies the viewer template
- **Progress tracking**: Real-time conversion status and summary
### Professional templates (`templates/`)
Named doctype templates — `report.html`, `letter.html`, `specification.html`
share `_head.html` / `_doc.html` / `_scripts.html` partials. A document selects one
with a `template:` field in its YAML front matter (default `report`), and turns on
legal-style heading numbering with `numbering: true` (default off). Both fields are
read by pandoc straight from the front matter. Server deployments additionally
resolve per-project/per-party overrides from `.zddc.d/templates/<name>.html`.
- **Modern responsive design**: Works on desktop, tablet, and mobile
- **Table of Contents (TOC)**: Auto-generated sidebar navigation with smooth scrolling
- **Print optimization**: Professional formatting for PDF generation
- Page break controls for tables
- Repeating table headers
- Proper page numbering
- Clean print layout
- **URL hash navigation**: Shareable links to specific document sections
- **Mobile-friendly**: Collapsible sidebar and touch-optimized interface
- **Professional styling**: Clean typography optimized for technical documents
## Usage
### Basic Conversion
```bash
# Convert all Markdown files in current directory
./convert *.md
# Convert with force overwrite
./convert -f *.md
# Convert to specific output directory
./convert -o rendered/ *.md
# Combine flags
./convert -f -o rendered/ *.md
```
### Configuration (`zddc.conf`)
Create a `zddc.conf` file in your project directory. It is **sourced as shell**,
so use `var="value"` syntax (no spaces around `=`). Only these four variables are
read; all are optional and feed the document header via pandoc `--variable`:
```sh
contractor="Contractor Name" # contracting organization (header)
client="Client Name" # client org (header, paired with project)
project="Project Name" # full project name
project_number="AR 28088" # shown in parentheses after the project name
```
The template path is discovered automatically (input dir → script dir →
symlink target) or set per-run with `-T`; the output directory is set with `-o`.
They are **not** `zddc.conf` keys.
### Directory Structure
```
your-project/
├── zddc.conf # Configuration file
├── document1.md # Source Markdown files
├── document2.md
└── rendered/ # Generated HTML files
├── document1.html
└── document2.html
```
## Template Features
### Navigation
- **TOC Generation**: Automatically creates navigation from document headings
- **Smooth Scrolling**: Click TOC items for smooth navigation to sections
- **Hash URLs**: Address bar updates with section anchors for sharing
- **Mobile Menu**: Collapsible sidebar for mobile devices
### Print Styling
- **Page Breaks**: Tables won't split across pages
- **Header Repetition**: Table headers repeat on each page
- **Professional Layout**: Optimized margins and typography
- **Page Numbers**: Sequential page numbering in footer
### Responsive Design
- **Desktop**: Full sidebar with TOC always visible
- **Tablet**: Collapsible sidebar with overlay
- **Mobile**: Hamburger menu with full-screen TOC overlay
## File Types Supported
- **Input**: Markdown (`.md`), DOCX (`.docx`), and HTML (`.html`/`.htm`) files
(auto-detected: DOCX→MD, MD→HTML, HTML→MD; override with `-t md|html|docx`).
Direct DOCX→HTML is not supported — convert to MD first.
- **Output**: HTML files with embedded CSS and JavaScript (plus MD and DOCX targets)
- **Images**: Supports embedded images and diagrams
- **Tables**: Full table support with print optimization
- **Code**: Syntax highlighting for code blocks
## Dependencies
- **pandoc**: Document conversion engine
- **Modern browser**: For viewing generated HTML files
- **Optional**: Web server for serving files (prevents CORS issues)
## Troubleshooting
### Common Issues
1. **Template not found**: Keep the `templates/` directory beside the script (or input), or pass `-T /path/to/template.html`
2. **Permission errors**: Make sure `convert` script is executable (`chmod +x convert`)
3. **Missing output**: Check that output directory exists or use `-o` to create it
4. **Print issues**: Use "Print to PDF" in browser for best results