ZDDC/pandoc/README.md
ZDDC d10cd23076 fix(pandoc): correctness, robustness & doc cleanup of convert tools
Audit-driven cleanup of the standalone pandoc/ CLI tools (no changes to
the server's own zddc/internal/convert engine).

convert:
- DOCX→MD now reads lowercase client/project from zddc.conf (was $CLIENT/
  $PROJECT, always empty)
- ZDDC filename parsing via a shared parse_zddc_filename helper that
  extracts each field with its own backref, so a '|' in the title no
  longer truncates it (was cut -d'|')
- drop duplicate --section-divs and no-op --id-prefix=

convert-diff:
- replace hardcoded "(AR 28088)" in the diff header with the configured
  $project_number (omitted when unset)
- only pass --template when one was found (empty --template= errors out)
- drop the false "Loading ZDDC configuration" log and the sed quote-escape
  that leaked backslashes into custom_header
- remove dead REV_A/REV_B and rev*_date extraction; fix usage typo;
  pin LC_TIME=C on date calls

index.sh:
- relative_path passes paths to python via argv (no -c interpolation) and
  uses realpath --relative-to as the fallback instead of an absolute path
- escape '|' in title/status before emitting the markdown table row

README:
- rewrite the stale server-side section to match the real binary+bubblewrap
  design and flags/defaults (was a non-existent podman/docker/image design)
- fix the invalid zddc.conf example (sourced shell, four real vars) and the
  understated input-format list

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 10:53:26 -05:00

211 lines
8.5 KiB
Markdown

# ZDDC Pandoc Tools
A collection of tools for converting Markdown documents to HTML with a professional viewer interface, optimized for technical documentation and engineering documents.
## Server-side conversion (`zddc-server`)
> The shell scripts in this folder are standalone CLI/batch tools. `zddc-server`
> implements its **own** on-demand conversion (Go package `zddc/internal/convert`)
> and does **not** call these scripts. It does, however, reuse the same
> `viewer-template.html` and `custom.css` (embedded at build time). See
> AGENTS.md → "Server-side document conversion" for the authoritative reference.
zddc-server can render any served `.md` on demand: requesting the sibling URL
`<path>/foo.docx` (or `.html` / `.pdf`) returns the converted bytes — no query
string. A real on-disk file of that name always wins; the virtual conversion
only fires when the requested file doesn't exist but `foo.md` does. The browse
app's markdown editor surfaces these as DOCX/HTML/PDF download links (auto-saving
a dirty buffer first so the output matches what's on screen).
**Architecture.** The Go code does the minimum — it `exec`s `pandoc` and
`chromium-browser` directly. The sandbox and resource caps live in the runtime
**image**, where `/usr/local/bin/{pandoc,chromium-browser}` are wrapper scripts
that run the real binary inside a per-conversion bubblewrap sandbox
(`--unshare-all`, read-only binds, `--tmpfs /tmp`, `--clearenv`) under cgroup v2
memory/PID caps. I/O is via stdin/stdout plus a per-call scratch dir. There is no
container runtime and no image pulling at request time.
The PDF flow is two-stage: pandoc renders the markdown through
`viewer-template.html` to standalone HTML, then headless Chromium prints that HTML
to PDF — preserving the viewer template's print-media CSS rather than going
through pandoc's LaTeX template.
Converted bytes are cached at `<dir>/.zddc.d/converted/<base>.<ext>` with mtime
synced to the source, so a fresh cache hit is a stat-and-serve with no `exec`.
A PUT/DELETE/MOVE on the source `.md` purges the sidecars. Per-project header
metadata (client/project/contractor/project_number) comes from the `.zddc`
`convert:` cascade; title/tracking_number/revision/status are derived from the
filename via `zddc.ParseFilename`.
Relevant flags (defaults in parens):
- `--convert-pandoc-binary` (`pandoc`) / `--convert-chromium-binary`
(`chromium-browser`; `chromium` on Debian) — PATH-resolved name or absolute path
- `--convert-scratch-dir` (`$TMPDIR`) — host scratch root for template + intermediates
- `--convert-mem-mib` (`1024`) — per-conversion memory cap (cgroup `memory.max`)
- `--convert-pids` (`256`) — per-conversion PID cap (cgroup `pids.max`)
- `--convert-timeout` (`60s`) — per-conversion wall clock (Go `context.WithTimeout`)
If `pandoc`/`chromium` aren't on PATH (e.g. running zddc-server outside the runtime
image) the endpoint serves 503 with a `Retry-After`; the rest of the server keeps
working. Running against raw pandoc/chromium with no wrapper gives a working but
**unsandboxed** endpoint — fine for dev iteration.
## Features
### Document Conversion (`convert`)
- **Batch processing**: Convert multiple Markdown files at once
- **Force overwrite**: `-f` flag to overwrite existing output files
- **Custom output directory**: `-o` flag to specify output location
- **Configuration-driven**: Uses `zddc.conf` for project-specific settings
- **Template integration**: Automatically applies the viewer template
- **Progress tracking**: Real-time conversion status and summary
### Professional Viewer Template (`viewer-template.html`)
- **Modern responsive design**: Works on desktop, tablet, and mobile
- **Table of Contents (TOC)**: Auto-generated sidebar navigation with smooth scrolling
- **Print optimization**: Professional formatting for PDF generation
- Page break controls for tables
- Repeating table headers
- Proper page numbering
- Clean print layout
- **URL hash navigation**: Shareable links to specific document sections
- **Mobile-friendly**: Collapsible sidebar and touch-optimized interface
- **Professional styling**: Clean typography optimized for technical documents
## Usage
### Basic Conversion
```bash
# Convert all Markdown files in current directory
./convert *.md
# Convert with force overwrite
./convert -f *.md
# Convert to specific output directory
./convert -o rendered/ *.md
# Combine flags
./convert -f -o rendered/ *.md
```
### Configuration (`zddc.conf`)
Create a `zddc.conf` file in your project directory. It is **sourced as shell**,
so use `var="value"` syntax (no spaces around `=`). Only these four variables are
read; all are optional and feed the document header via pandoc `--variable`:
```sh
contractor="Contractor Name" # contracting organization (header)
client="Client Name" # client org (header, paired with project)
project="Project Name" # full project name
project_number="AR 28088" # shown in parentheses after the project name
```
The template path is discovered automatically (input dir → script dir →
symlink target) or set per-run with `-T`; the output directory is set with `-o`.
They are **not** `zddc.conf` keys.
### Directory Structure
```
your-project/
├── zddc.conf # Configuration file
├── document1.md # Source Markdown files
├── document2.md
└── rendered/ # Generated HTML files
├── document1.html
└── document2.html
```
## Template Features
### Navigation
- **TOC Generation**: Automatically creates navigation from document headings
- **Smooth Scrolling**: Click TOC items for smooth navigation to sections
- **Hash URLs**: Address bar updates with section anchors for sharing
- **Mobile Menu**: Collapsible sidebar for mobile devices
### Print Styling
- **Page Breaks**: Tables won't split across pages
- **Header Repetition**: Table headers repeat on each page
- **Professional Layout**: Optimized margins and typography
- **Page Numbers**: Sequential page numbering in footer
### Responsive Design
- **Desktop**: Full sidebar with TOC always visible
- **Tablet**: Collapsible sidebar with overlay
- **Mobile**: Hamburger menu with full-screen TOC overlay
## Advanced Usage
### Custom Templates
You can customize the viewer template by:
1. Copying `viewer-template.html` to your project
2. Modifying the CSS and HTML structure
3. Updating `zddc.conf` to point to your custom template
### Batch Processing
For large document sets:
```bash
# Process all markdown files recursively
find . -name "*.md" -exec ./convert -f -o rendered/ {} +
# Process specific document types
./convert -f -o rendered/ *-SOW-*.md *-DBD-*.md
```
### Integration with Build Systems
The convert tool returns proper exit codes and can be integrated into CI/CD pipelines:
```bash
# In a build script
if ./convert -f -o dist/ *.md; then
echo "Documentation built successfully"
else
echo "Documentation build failed"
exit 1
fi
```
## File Types Supported
- **Input**: Markdown (`.md`), DOCX (`.docx`), and HTML (`.html`/`.htm`) files
(auto-detected: DOCX→MD, MD→HTML, HTML→MD; override with `-t md|html|docx`).
Direct DOCX→HTML is not supported — convert to MD first.
- **Output**: HTML files with embedded CSS and JavaScript (plus MD and DOCX targets)
- **Images**: Supports embedded images and diagrams
- **Tables**: Full table support with print optimization
- **Code**: Syntax highlighting for code blocks
## Dependencies
- **pandoc**: Document conversion engine
- **Modern browser**: For viewing generated HTML files
- **Optional**: Web server for serving files (prevents CORS issues)
## Troubleshooting
### Common Issues
1. **Template not found**: Ensure `zddc.conf` points to correct template path
2. **Permission errors**: Make sure `convert` script is executable (`chmod +x convert`)
3. **Missing output**: Check that output directory exists or use `-o` to create it
4. **Print issues**: Use "Print to PDF" in browser for best results
### Performance
- Large documents (>1000 pages) may take longer to render
- Consider splitting very large documents into sections
- Use batch processing for multiple files
## Examples
### Engineering Documentation
Perfect for:
- Design basis documents
- Specifications and standards
- Project requirements
- Technical procedures
- Quality documentation
### Features Optimized For
- **Professional appearance**: Clean, corporate styling
- **Technical content**: Tables, diagrams, code blocks
- **Print output**: PDF generation with proper formatting
- **Navigation**: Easy browsing of long documents
- **Sharing**: URL fragments for referencing specific sections