ZDDC/zddc/internal/convert/inline-media.lua
ZDDC 16d88010a6 feat(server): full md/docx/html conversion matrix + base64 image inlining
Generalize the conversion engine from markdown-source-only to a (from→to)
dispatcher, convert.Convert, supporting:

  md   → docx | html | pdf
  docx → md   | html
  html → md   | docx

- convertToMarkdown (docx→md, html→md): pandoc -t gfm --wrap=none with an
  embedded inline-media.lua filter that base64-inlines mediabag images as data:
  URIs, so the output .md is self-contained (markdown has no --embed-resources).
- convertToHTML now takes a source format: docx→html reuses the doctype template
  and --embed-resources base64-inlines the docx's images automatically.
- convertToDocx takes a source format: html→docx embeds images natively.
- ToDocx/ToHTML/ToPDF are kept as the md-source entry points, delegating to the
  shared internals. writeScratchFiles generalizes the old template-set writer.

Routing (converthandler.go):
- RecognizeVirtualConvert maps any target ext {md,docx,html,pdf} to the first
  existing real sibling source by precedence (md←docx,html; docx←md,html;
  html←md,docx; pdf←md). Real files still win (dispatcher stats first).
- ServeConverted accepts md; buildAndStore dispatches on (ext(src), format) via
  convert.Convert; purgeConverted clears all derived siblings on any write.

Tests: per-direction command-shape assertions (convert) + recognizer matrix and
precedence (handler). Verified end-to-end with real pandoc (docx→md/html,
html→md/docx, base64 images). Full ./... suite green.

PDF stays markdown-only for now (docx/html→pdf would need a two-stage hop).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 21:02:11 -05:00

31 lines
1.2 KiB
Lua

-- inline-media.lua — pandoc filter that rewrites every image to a self-contained
-- base64 data: URI, pulling the bytes from pandoc's mediabag (populated when
-- reading DOCX, or fetched for HTML). Used by the docx→md / html→md conversions
-- so the resulting markdown carries its images inline (markdown output has no
-- native --embed-resources equivalent).
local b = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
local function base64(data)
return ((data:gsub('.', function(x)
local r, byte = '', x:byte()
for i = 8, 1, -1 do r = r .. (byte % 2 ^ i - byte % 2 ^ (i - 1) > 0 and '1' or '0') end
return r
end) .. '0000'):gsub('%d%d%d?%d?%d?%d?', function(x)
if #x < 6 then return '' end
local c = 0
for i = 1, 6 do c = c + (x:sub(i, i) == '1' and 2 ^ (6 - i) or 0) end
return b:sub(c + 1, c + 1)
end) .. ({ '', '==', '=' })[#data % 3 + 1])
end
function Image(img)
local mt, data = pandoc.mediabag.lookup(img.src)
if not data then
mt, data = pandoc.mediabag.fetch(img.src)
end
if data then
img.src = 'data:' .. (mt or 'application/octet-stream') .. ';base64,' .. base64(data)
end
return img
end