Commit graph

1 commit

Author SHA1 Message Date
16d88010a6 feat(server): full md/docx/html conversion matrix + base64 image inlining
Generalize the conversion engine from markdown-source-only to a (from→to)
dispatcher, convert.Convert, supporting:

  md   → docx | html | pdf
  docx → md   | html
  html → md   | docx

- convertToMarkdown (docx→md, html→md): pandoc -t gfm --wrap=none with an
  embedded inline-media.lua filter that base64-inlines mediabag images as data:
  URIs, so the output .md is self-contained (markdown has no --embed-resources).
- convertToHTML now takes a source format: docx→html reuses the doctype template
  and --embed-resources base64-inlines the docx's images automatically.
- convertToDocx takes a source format: html→docx embeds images natively.
- ToDocx/ToHTML/ToPDF are kept as the md-source entry points, delegating to the
  shared internals. writeScratchFiles generalizes the old template-set writer.

Routing (converthandler.go):
- RecognizeVirtualConvert maps any target ext {md,docx,html,pdf} to the first
  existing real sibling source by precedence (md←docx,html; docx←md,html;
  html←md,docx; pdf←md). Real files still win (dispatcher stats first).
- ServeConverted accepts md; buildAndStore dispatches on (ext(src), format) via
  convert.Convert; purgeConverted clears all derived siblings on any write.

Tests: per-direction command-shape assertions (convert) + recognizer matrix and
precedence (handler). Verified end-to-end with real pandoc (docx→md/html,
html→md/docx, base64 images). Full ./... suite green.

PDF stays markdown-only for now (docx/html→pdf would need a two-stage hop).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 21:02:11 -05:00