Generalize the conversion engine from markdown-source-only to a (from→to)
dispatcher, convert.Convert, supporting:
md → docx | html | pdf
docx → md | html
html → md | docx
- convertToMarkdown (docx→md, html→md): pandoc -t gfm --wrap=none with an
embedded inline-media.lua filter that base64-inlines mediabag images as data:
URIs, so the output .md is self-contained (markdown has no --embed-resources).
- convertToHTML now takes a source format: docx→html reuses the doctype template
and --embed-resources base64-inlines the docx's images automatically.
- convertToDocx takes a source format: html→docx embeds images natively.
- ToDocx/ToHTML/ToPDF are kept as the md-source entry points, delegating to the
shared internals. writeScratchFiles generalizes the old template-set writer.
Routing (converthandler.go):
- RecognizeVirtualConvert maps any target ext {md,docx,html,pdf} to the first
existing real sibling source by precedence (md←docx,html; docx←md,html;
html←md,docx; pdf←md). Real files still win (dispatcher stats first).
- ServeConverted accepts md; buildAndStore dispatches on (ext(src), format) via
convert.Convert; purgeConverted clears all derived siblings on any write.
Tests: per-direction command-shape assertions (convert) + recognizer matrix and
precedence (handler). Verified end-to-end with real pandoc (docx→md/html,
html→md/docx, base64 images). Full ./... suite green.
PDF stays markdown-only for now (docx/html→pdf would need a two-stage hop).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
31 lines
1.2 KiB
Lua
31 lines
1.2 KiB
Lua
-- inline-media.lua — pandoc filter that rewrites every image to a self-contained
|
|
-- base64 data: URI, pulling the bytes from pandoc's mediabag (populated when
|
|
-- reading DOCX, or fetched for HTML). Used by the docx→md / html→md conversions
|
|
-- so the resulting markdown carries its images inline (markdown output has no
|
|
-- native --embed-resources equivalent).
|
|
|
|
local b = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
|
|
|
|
local function base64(data)
|
|
return ((data:gsub('.', function(x)
|
|
local r, byte = '', x:byte()
|
|
for i = 8, 1, -1 do r = r .. (byte % 2 ^ i - byte % 2 ^ (i - 1) > 0 and '1' or '0') end
|
|
return r
|
|
end) .. '0000'):gsub('%d%d%d?%d?%d?%d?', function(x)
|
|
if #x < 6 then return '' end
|
|
local c = 0
|
|
for i = 1, 6 do c = c + (x:sub(i, i) == '1' and 2 ^ (6 - i) or 0) end
|
|
return b:sub(c + 1, c + 1)
|
|
end) .. ({ '', '==', '=' })[#data % 3 + 1])
|
|
end
|
|
|
|
function Image(img)
|
|
local mt, data = pandoc.mediabag.lookup(img.src)
|
|
if not data then
|
|
mt, data = pandoc.mediabag.fetch(img.src)
|
|
end
|
|
if data then
|
|
img.src = 'data:' .. (mt or 'application/octet-stream') .. ';base64,' .. base64(data)
|
|
end
|
|
return img
|
|
end
|