ZDDC/zddc/internal/fs/resolve_test.go
ZDDC 85521b98de feat(server): case-insensitive URL canonicalization at dispatch
URLs are now case-insensitive against the on-disk casing under
ZDDC_ROOT, with a lowercase-wins tiebreak when sibling case variants
exist. File and folder names preserve case on disk — the change is a
pure URL→FS-name mapping; nothing renames anything.

internal/fs/resolve.go ResolveCanonical walks segments left-to-right
under fsRoot. Per segment: try lowercase first (canonical / cheap
lstat fast-path), then exact-case, then readdir+CI scan with the
all-lowercase variant winning the tiebreak. Walk stops at the first
segment that doesn't exist on disk so virtual prefixes (.archive,
.profile, .tokens, .auth) and 404 paths flow through with their tail
preserved verbatim. Path-escape safety check on the resolved abs
path matches the existing safeJoin pattern.

Wired in at the top of cmd/zddc-server/main.go dispatch(), which
rewrites r.URL.Path before any handler runs. Downstream handlers
(plus their existing safeJoin calls and the cascade walker) pick up
canonical case automatically — no per-handler changes. The ACL
cascade benefits from this for free since EffectivePolicy is keyed
by the now-canonical absolute path.

internal/handler/middleware.go AccessLogMiddleware snapshots the
as-typed URL path before the rewrite. The audit log's `path` field
records what the client actually sent; a `resolved_path` field is
added only when canonicalization changed it. Operators reading the
log can see both the raw request and what was served.

Lowercase as the project-wide canonical convention is already
honoured by the auto-created folders in internal/zddc/ensure.go
(working/, staging/, archive/<party>/incoming/) and the server's
own state dirs (_app/, .zddc.d/tokens/, .zddc.d/outbox/,
.zddc.d/logs/). Operators who drop a Mixed-Case-Folder/ on disk
keep that casing — the resolver finds it via the readdir tier.

Performance: the lowercase-first lstat is one syscall on the hot
path. Only mismatches (mixed-case URL where on-disk is also
mixed-case) pay the readdir+EqualFold scan, and Linux page-caches
small-dir readdirs aggressively. Apache mod_speling uses the same
"try then fallback" pattern.

Tests:
- internal/fs/resolve_test.go — 9 unit tests: exact-case,
  mixed-case-URL-with-lowercase-on-disk, mixed-case-URL-with-
  mixed-case-on-disk, both-cases-exist-lowercase-wins, nonexistent
  segment preserves remainder, file-segment terminates walk, escape
  rejection, trailing-slash normalization, root.
- cmd/zddc-server/main_test.go TestDispatchCaseInsensitiveURL —
  end-to-end through the dispatcher with sibling Archive/ and
  archive/ on disk; all four URL casings of the same path serve the
  lowercase variant's content (proves the tiebreak fires through
  every layer).
- Full Go suite green.

Docs: AGENTS.md gains a "URL handling" subsection in the
zddc-server section; ARCHITECTURE.md security-model table gains a
"URL canonicalization" row.

Out of scope (separate decisions, can revisit if needed):
- ACL glob CI-matching. If .zddc rules use mixed-case URL globs,
  they won't match the canonical lowercase URL. Workable today by
  writing rules in lowercase. Touches a different package.
- Redirect-to-canonical (303). Server serves under whichever case
  the client used; canonicalization is internal. Could 301 to
  canonical for SEO/bookmark hygiene as a follow-up.
- Client-mode (proxy/cache). Only master mode is wired so far.
  Cache-handler CI lives in internal/cache/cache.go cachePathFor
  and is a separate code path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:09:47 -05:00

156 lines
4.3 KiB
Go

package fs
import (
"os"
"path/filepath"
"runtime"
"testing"
)
func mkdir(t *testing.T, parts ...string) {
t.Helper()
if err := os.MkdirAll(filepath.Join(parts...), 0o755); err != nil {
t.Fatal(err)
}
}
func TestResolveCanonical_RootAndEmpty(t *testing.T) {
root := t.TempDir()
for _, in := range []string{"/", "", "//"} {
abs, url, ok := ResolveCanonical(root, in)
if !ok {
t.Fatalf("%q: ok=false", in)
}
if abs != root || url != "/" {
t.Fatalf("%q: abs=%q url=%q", in, abs, url)
}
}
}
func TestResolveCanonical_ExactCase(t *testing.T) {
root := t.TempDir()
mkdir(t, root, "archive", "incoming")
abs, url, ok := ResolveCanonical(root, "/archive/incoming")
if !ok || url != "/archive/incoming" {
t.Fatalf("ok=%v url=%q", ok, url)
}
if abs != filepath.Join(root, "archive", "incoming") {
t.Fatalf("abs=%q", abs)
}
}
func TestResolveCanonical_MixedCaseURLLowercaseOnDisk(t *testing.T) {
root := t.TempDir()
mkdir(t, root, "archive", "incoming")
abs, url, ok := ResolveCanonical(root, "/Archive/Incoming")
if !ok || url != "/archive/incoming" {
t.Fatalf("ok=%v url=%q", ok, url)
}
if abs != filepath.Join(root, "archive", "incoming") {
t.Fatalf("abs=%q", abs)
}
}
func TestResolveCanonical_OnlyMixedCaseExists(t *testing.T) {
root := t.TempDir()
mkdir(t, root, "Archive", "Incoming")
abs, url, ok := ResolveCanonical(root, "/archive/incoming")
if !ok || url != "/Archive/Incoming" {
t.Fatalf("ok=%v url=%q", ok, url)
}
if abs != filepath.Join(root, "Archive", "Incoming") {
t.Fatalf("abs=%q", abs)
}
}
func TestResolveCanonical_BothCasesExistLowercaseWins(t *testing.T) {
if runtime.GOOS == "darwin" || runtime.GOOS == "windows" {
t.Skip("filesystem may be case-insensitive; tiebreak only meaningful on case-sensitive FS")
}
root := t.TempDir()
mkdir(t, root, "Archive")
mkdir(t, root, "archive")
if err := os.WriteFile(filepath.Join(root, "Archive", "marker"), []byte("upper"), 0o644); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(root, "archive", "marker"), []byte("lower"), 0o644); err != nil {
t.Fatal(err)
}
for _, in := range []string{"/Archive/marker", "/archive/marker", "/aRcHiVe/marker"} {
abs, url, ok := ResolveCanonical(root, in)
if !ok {
t.Fatalf("%q: ok=false", in)
}
if url != "/archive/marker" {
t.Fatalf("%q: url=%q (want /archive/marker)", in, url)
}
body, err := os.ReadFile(abs)
if err != nil {
t.Fatalf("%q: read %s: %v", in, abs, err)
}
if string(body) != "lower" {
t.Fatalf("%q: body=%q (want \"lower\" — lowercase variant must win)", in, body)
}
}
}
func TestResolveCanonical_NonexistentSegmentPreservesRemainder(t *testing.T) {
root := t.TempDir()
mkdir(t, root, "archive")
abs, url, ok := ResolveCanonical(root, "/Archive/.archive/TR-001.html")
if !ok {
t.Fatal("ok=false")
}
// Walk canonicalizes "Archive" to "archive"; the virtual ".archive"
// segment doesn't exist on disk, so the remainder passes through
// unchanged so the dispatcher's virtual-prefix routing still fires.
if url != "/archive/.archive/TR-001.html" {
t.Fatalf("url=%q", url)
}
if abs != filepath.Join(root, "archive", ".archive", "TR-001.html") {
t.Fatalf("abs=%q", abs)
}
}
func TestResolveCanonical_FileSegmentTerminatesWalk(t *testing.T) {
root := t.TempDir()
mkdir(t, root, "archive")
if err := os.WriteFile(filepath.Join(root, "archive", "Doc.PDF"), []byte("x"), 0o644); err != nil {
t.Fatal(err)
}
abs, url, ok := ResolveCanonical(root, "/Archive/doc.pdf")
if !ok {
t.Fatal("ok=false")
}
// On Linux Doc.PDF exists but doc.pdf does not — exact-case tier
// finds Doc.PDF and uses it.
if url != "/archive/Doc.PDF" {
t.Fatalf("url=%q", url)
}
_ = abs
}
func TestResolveCanonical_RejectsEscape(t *testing.T) {
root := t.TempDir()
mkdir(t, root, "archive")
// filepath.Clean reduces "/archive/../.." to "/.."; Resolve sees
// segments that don't exist on disk and walks them verbatim. The
// final containment check must reject the result.
_, _, ok := ResolveCanonical(root, "/archive/../../etc")
if ok {
t.Fatal("expected ok=false for escape path")
}
}
func TestResolveCanonical_TrailingSlashesNormalized(t *testing.T) {
root := t.TempDir()
mkdir(t, root, "archive", "incoming")
_, url, ok := ResolveCanonical(root, "/Archive/Incoming/")
if !ok {
t.Fatal("ok=false")
}
if url != "/archive/incoming" {
t.Fatalf("url=%q", url)
}
}