ZDDC/pandoc/README.md
ZDDC b5aab81d31 feat(zddc): MD→{docx,html,pdf} server-side conversion via stock pandoc + chromium containers
New endpoint GET /<path>/foo.md?convert=docx|html|pdf renders a markdown
source on demand. Surfaced as the Download buttons in browse's markdown
editor (separate commit).

Execution model — two upstream container images, lazy-pulled:

  • docker.io/pandoc/latex:latest  — MD→DOCX, MD→HTML (entrypoint pandoc)
  • docker.io/zenika/alpine-chrome — HTML→PDF (entrypoint chromium-browser)

No custom image build. The runner passes --pull=missing on every podman/
docker invocation so the operator only needs the runtime installed —
first request pulls the image, subsequent requests use the local cache.
Overrides: --convert-pandoc-image / --convert-chromium-image (and the
matching ZDDC_CONVERT_* env vars). Engine: --convert-engine (podman
preferred, docker fallback). Resource caps: --convert-mem-mib (512),
--convert-cpus (2), --convert-pids (100), --convert-timeout (30s).

PDF flow is two-stage: pandoc renders the markdown through the embedded
viewer-template.html to standalone HTML, then chromium prints that HTML
via --print-to-pdf. Preserves the print-media CSS already authored in
viewer-template.html rather than going through pandoc's LaTeX template.

Each conversion runs in a throw-away container with --rm --network=none
--read-only --tmpfs=/tmp --cap-drop=ALL --security-opt=no-new-privileges
--env=HOME=/tmp plus a bind-mounted scratch dir for I/O. Pandoc reads
markdown from stdin / writes to stdout; the viewer template lives at
/tpl (ro). Chromium reads HTML from a read-write bind mount at /pdf
and writes the PDF to the same mount; the host reads it back. No shell
wrappers, no shell quoting — argv flows straight into each image's
entrypoint.

On-disk cache at <dir>/.converted/<base>.<ext> with mtime synced to the
source. Fast path is a stat-and-serve with no exec; slow path
singleflights concurrent requests for the same target. PUT/DELETE/MOVE
on the source .md purges the .converted/ sidecars.

Per-project template variables (client/project/contractor/project_number)
come from a new .zddc `convert:` cascade block, walked leaf→root with
per-key latest-wins. Filename-derived variables (title, tracking_number,
revision, status, is_draft) come from a new zddc.ParseFilename helper.

If neither podman nor docker is on PATH, the endpoint serves 503 with
a clear Retry-After. The rest of the server keeps working.

This is the first os/exec site in the codebase. The hardening in
internal/convert/runner.go — context.CancelFunc → process kill,
cmd.WaitDelay, platform-specific SysProcAttr (Setpgid + Pdeathsig on
Linux), minimal env, stdout cap via limitWriter, stderr ring buffer —
sets the pattern for any future shell-outs.

Public surface:
  convert.ToDocx(ctx, source, meta) / .ToHTML / .ToPDF
  convert.Probe(ctx, engineOverride) → install Runner if engine present
  convert.SetImages(pandoc, chromium)
  convert.ConfigureLimits(memMiB, cpus, pids, timeout)
  convert.Available()

Container handler at internal/handler/converthandler.go; dispatcher
branch in cmd/zddc-server/main.go inserts the convert lookup after the
existing ACL gate, reusing the source file's read policy verbatim.
2026-05-13 10:33:56 -05:00

200 lines
6.8 KiB
Markdown

# ZDDC Pandoc Tools
A collection of tools for converting Markdown documents to HTML with a professional viewer interface, optimized for technical documentation and engineering documents.
## Server-side conversion (`zddc-server`)
zddc-server can offer the same conversions on demand: a `.md` file in any
served directory becomes downloadable as `.docx`, `.html`, and `.pdf` via the
`?convert=` query parameter, surfaced as Download buttons in the browse app's
markdown editor.
The server shells out to two upstream container images, pulling each on
first use via `--pull=missing`. No custom image build is required —
operators just install `podman` (preferred) or `docker`, and the first
conversion request pulls the image:
- `docker.io/pandoc/latex:latest` — MD → DOCX and MD → HTML
(override: `--convert-pandoc-image=` or `ZDDC_CONVERT_PANDOC_IMAGE`;
switch to `docker.io/pandoc/core:latest` for a ~90% size reduction
if you don't need pandoc's native LaTeX-PDF path)
- `docker.io/zenika/alpine-chrome:latest` — HTML → PDF
(override: `--convert-chromium-image=` or `ZDDC_CONVERT_CHROMIUM_IMAGE`)
The PDF flow is two-stage: pandoc renders the markdown through
`viewer-template.html` to standalone HTML, then headless Chromium
prints that HTML to PDF. This preserves the existing print-media CSS
authored for the viewer template rather than going through pandoc's
LaTeX template.
If neither podman nor docker is on PATH the endpoint serves 503 with
a clear "no container runtime" message. Engine choice is overridable
via `--convert-engine=` or `ZDDC_CONVERT_ENGINE`.
Resource limits are per-container and configurable: `--convert-mem-mib`
(default 512), `--convert-cpus` (default "2"), `--convert-pids`
(default 100), `--convert-timeout` (default 30s).
Each conversion runs in a throw-away container with
`--rm --network=none --read-only --tmpfs=/tmp --cap-drop=ALL
--security-opt=no-new-privileges` plus a bind-mounted scratch dir
for I/O (read-only for the template; read-write for the PDF output).
## Features
### Document Conversion (`convert`)
- **Batch processing**: Convert multiple Markdown files at once
- **Force overwrite**: `-f` flag to overwrite existing output files
- **Custom output directory**: `-o` flag to specify output location
- **Configuration-driven**: Uses `zddc.conf` for project-specific settings
- **Template integration**: Automatically applies the viewer template
- **Progress tracking**: Real-time conversion status and summary
### Professional Viewer Template (`viewer-template.html`)
- **Modern responsive design**: Works on desktop, tablet, and mobile
- **Table of Contents (TOC)**: Auto-generated sidebar navigation with smooth scrolling
- **Print optimization**: Professional formatting for PDF generation
- Page break controls for tables
- Repeating table headers
- Proper page numbering
- Clean print layout
- **URL hash navigation**: Shareable links to specific document sections
- **Mobile-friendly**: Collapsible sidebar and touch-optimized interface
- **Professional styling**: Clean typography optimized for technical documents
## Usage
### Basic Conversion
```bash
# Convert all Markdown files in current directory
./convert *.md
# Convert with force overwrite
./convert -f *.md
# Convert to specific output directory
./convert -o rendered/ *.md
# Combine flags
./convert -f -o rendered/ *.md
```
### Configuration (`zddc.conf`)
Create a `zddc.conf` file in your project directory:
```ini
# Project metadata
title = "Project Documentation"
author = "Your Organization"
date = "2024"
# Template settings
template = "/path/to/viewer-template.html"
css = "custom-styles.css"
# Output settings
output_dir = "rendered"
```
### Directory Structure
```
your-project/
├── zddc.conf # Configuration file
├── document1.md # Source Markdown files
├── document2.md
└── rendered/ # Generated HTML files
├── document1.html
└── document2.html
```
## Template Features
### Navigation
- **TOC Generation**: Automatically creates navigation from document headings
- **Smooth Scrolling**: Click TOC items for smooth navigation to sections
- **Hash URLs**: Address bar updates with section anchors for sharing
- **Mobile Menu**: Collapsible sidebar for mobile devices
### Print Styling
- **Page Breaks**: Tables won't split across pages
- **Header Repetition**: Table headers repeat on each page
- **Professional Layout**: Optimized margins and typography
- **Page Numbers**: Sequential page numbering in footer
### Responsive Design
- **Desktop**: Full sidebar with TOC always visible
- **Tablet**: Collapsible sidebar with overlay
- **Mobile**: Hamburger menu with full-screen TOC overlay
## Advanced Usage
### Custom Templates
You can customize the viewer template by:
1. Copying `viewer-template.html` to your project
2. Modifying the CSS and HTML structure
3. Updating `zddc.conf` to point to your custom template
### Batch Processing
For large document sets:
```bash
# Process all markdown files recursively
find . -name "*.md" -exec ./convert -f -o rendered/ {} +
# Process specific document types
./convert -f -o rendered/ *-SOW-*.md *-DBD-*.md
```
### Integration with Build Systems
The convert tool returns proper exit codes and can be integrated into CI/CD pipelines:
```bash
# In a build script
if ./convert -f -o dist/ *.md; then
echo "Documentation built successfully"
else
echo "Documentation build failed"
exit 1
fi
```
## File Types Supported
- **Input**: Markdown (`.md`) files with pandoc extensions
- **Output**: HTML files with embedded CSS and JavaScript
- **Images**: Supports embedded images and diagrams
- **Tables**: Full table support with print optimization
- **Code**: Syntax highlighting for code blocks
## Dependencies
- **pandoc**: Document conversion engine
- **Modern browser**: For viewing generated HTML files
- **Optional**: Web server for serving files (prevents CORS issues)
## Troubleshooting
### Common Issues
1. **Template not found**: Ensure `zddc.conf` points to correct template path
2. **Permission errors**: Make sure `convert` script is executable (`chmod +x convert`)
3. **Missing output**: Check that output directory exists or use `-o` to create it
4. **Print issues**: Use "Print to PDF" in browser for best results
### Performance
- Large documents (>1000 pages) may take longer to render
- Consider splitting very large documents into sections
- Use batch processing for multiple files
## Examples
### Engineering Documentation
Perfect for:
- Design basis documents
- Specifications and standards
- Project requirements
- Technical procedures
- Quality documentation
### Features Optimized For
- **Professional appearance**: Clean, corporate styling
- **Technical content**: Tables, diagrams, code blocks
- **Print output**: PDF generation with proper formatting
- **Navigation**: Easy browsing of long documents
- **Sharing**: URL fragments for referencing specific sections