ZDDC/zddc/internal/handler/middleware.go
ZDDC 85521b98de feat(server): case-insensitive URL canonicalization at dispatch
URLs are now case-insensitive against the on-disk casing under
ZDDC_ROOT, with a lowercase-wins tiebreak when sibling case variants
exist. File and folder names preserve case on disk — the change is a
pure URL→FS-name mapping; nothing renames anything.

internal/fs/resolve.go ResolveCanonical walks segments left-to-right
under fsRoot. Per segment: try lowercase first (canonical / cheap
lstat fast-path), then exact-case, then readdir+CI scan with the
all-lowercase variant winning the tiebreak. Walk stops at the first
segment that doesn't exist on disk so virtual prefixes (.archive,
.profile, .tokens, .auth) and 404 paths flow through with their tail
preserved verbatim. Path-escape safety check on the resolved abs
path matches the existing safeJoin pattern.

Wired in at the top of cmd/zddc-server/main.go dispatch(), which
rewrites r.URL.Path before any handler runs. Downstream handlers
(plus their existing safeJoin calls and the cascade walker) pick up
canonical case automatically — no per-handler changes. The ACL
cascade benefits from this for free since EffectivePolicy is keyed
by the now-canonical absolute path.

internal/handler/middleware.go AccessLogMiddleware snapshots the
as-typed URL path before the rewrite. The audit log's `path` field
records what the client actually sent; a `resolved_path` field is
added only when canonicalization changed it. Operators reading the
log can see both the raw request and what was served.

Lowercase as the project-wide canonical convention is already
honoured by the auto-created folders in internal/zddc/ensure.go
(working/, staging/, archive/<party>/incoming/) and the server's
own state dirs (_app/, .zddc.d/tokens/, .zddc.d/outbox/,
.zddc.d/logs/). Operators who drop a Mixed-Case-Folder/ on disk
keep that casing — the resolver finds it via the readdir tier.

Performance: the lowercase-first lstat is one syscall on the hot
path. Only mismatches (mixed-case URL where on-disk is also
mixed-case) pay the readdir+EqualFold scan, and Linux page-caches
small-dir readdirs aggressively. Apache mod_speling uses the same
"try then fallback" pattern.

Tests:
- internal/fs/resolve_test.go — 9 unit tests: exact-case,
  mixed-case-URL-with-lowercase-on-disk, mixed-case-URL-with-
  mixed-case-on-disk, both-cases-exist-lowercase-wins, nonexistent
  segment preserves remainder, file-segment terminates walk, escape
  rejection, trailing-slash normalization, root.
- cmd/zddc-server/main_test.go TestDispatchCaseInsensitiveURL —
  end-to-end through the dispatcher with sibling Archive/ and
  archive/ on disk; all four URL casings of the same path serve the
  lowercase variant's content (proves the tiebreak fires through
  every layer).
- Full Go suite green.

Docs: AGENTS.md gains a "URL handling" subsection in the
zddc-server section; ARCHITECTURE.md security-model table gains a
"URL canonicalization" row.

Out of scope (separate decisions, can revisit if needed):
- ACL glob CI-matching. If .zddc rules use mixed-case URL globs,
  they won't match the canonical lowercase URL. Workable today by
  writing rules in lowercase. Touches a different package.
- Redirect-to-canonical (303). Server serves under whichever case
  the client used; canonicalization is internal. Could 301 to
  canonical for SEO/bookmark hygiene as a follow-up.
- Client-mode (proxy/cache). Only master mode is wired so far.
  Cache-handler CI lives in internal/cache/cache.go cachePathFor
  and is a separate code path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:09:47 -05:00

232 lines
8.1 KiB
Go

package handler
import (
"context"
"errors"
"net/http"
"strings"
"time"
"codeberg.org/VARASYS/ZDDC/zddc/internal/auth"
"codeberg.org/VARASYS/ZDDC/zddc/internal/config"
"codeberg.org/VARASYS/ZDDC/zddc/internal/policy"
"log/slog"
)
type contextKey string
// EmailKey is the context key for the authenticated user's email.
const EmailKey contextKey = "email"
// DeciderKey is the context key for the request's policy decider.
// Set by ACLMiddleware so handlers deep in the stack can issue policy
// queries without taking the decider as an explicit parameter. Although
// the decider is an app-wide singleton (not per-request state), routing
// it through context keeps the call-site signatures stable across the
// "swap internal evaluator for external OPA" plumbing change.
const DeciderKey contextKey = "policy-decider"
// ACLMiddleware extracts the user email and stores it (along with the
// policy decider) in the request context. It does NOT enforce ACL
// itself — each handler performs its own ACL check via
// policy.AllowFromChain.
//
// Two email sources, in order:
//
// 1. `Authorization: Bearer <token>` — if present, the token is
// validated against the supplied auth.Store. On success, the
// request runs as the token-file's email. On failure (invalid /
// expired / no validator configured), the middleware short-circuits
// with 401 — silently falling back to header-based auth would let
// a misconfigured client masquerade as anonymous.
// 2. Otherwise, the email is read from cfg.EmailHeader, exactly as
// before. This is the upstream-auth-proxy path (oauth2-proxy,
// Caddy auth, etc.) that injects the header on validated requests.
//
// `tokens` may be nil — deployments without the token system simply
// reject any Bearer attempts with 401. This keeps Bearer-vs-no-Bearer
// trust paths decoupled from the operator's choice to issue tokens.
func ACLMiddleware(cfg config.Config, decider policy.Decider, tokens *auth.Store, next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
var email string
if bearer := bearerToken(r); bearer != "" {
if tokens == nil {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
tok, err := tokens.Validate(bearer)
if err != nil {
if !errors.Is(err, auth.ErrInvalidToken) {
slog.Warn("token validation error", "err", err)
}
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
email = tok.Email
} else {
email = r.Header.Get(cfg.EmailHeader)
}
// DEBUG-level header dump for diagnosing proxy / SSO header
// passthrough. Off by default (LogLevel info); enable with
// ZDDC_LOG_LEVEL=debug. Logs the configured header name, the
// observed value at that name, and the full request header
// map so an operator can see exactly what reached the binary.
// Note: at debug level this also captures auth tokens, cookies,
// and anything else upstream proxies forward — only enable in
// trusted environments.
slog.Debug("request headers",
"configured", cfg.EmailHeader,
"observed", email,
"headers", r.Header)
ctx := context.WithValue(r.Context(), EmailKey, email)
if decider != nil {
ctx = context.WithValue(ctx, DeciderKey, decider)
}
next.ServeHTTP(w, r.WithContext(ctx))
})
}
// bearerToken returns the token value from the Authorization header
// (case-insensitive on the "Bearer" scheme per RFC 6750), or the empty
// string when no Bearer credential is present.
func bearerToken(r *http.Request) string {
v := r.Header.Get("Authorization")
if v == "" {
return ""
}
const prefix = "bearer "
if len(v) <= len(prefix) || !strings.EqualFold(v[:len(prefix)], prefix) {
return ""
}
return strings.TrimSpace(v[len(prefix):])
}
// EmailFromContext extracts the user email from the request context.
func EmailFromContext(r *http.Request) string {
if v, ok := r.Context().Value(EmailKey).(string); ok {
return v
}
return ""
}
// WithEmail returns a context carrying email under EmailKey. Test seam
// for handlers that look up the authenticated user via EmailFromContext;
// production traffic gets the same value injected by ACLMiddleware.
func WithEmail(ctx context.Context, email string) context.Context {
return context.WithValue(ctx, EmailKey, email)
}
// DeciderFromContext extracts the policy decider from the request
// context. Returns the internal decider as a fallback if none was
// installed — this matches the "no OPA configured" semantics and
// keeps test setups that don't install ACLMiddleware functional.
func DeciderFromContext(r *http.Request) policy.Decider {
if v, ok := r.Context().Value(DeciderKey).(policy.Decider); ok {
return v
}
return &policy.InternalDecider{}
}
// responseWriter wraps http.ResponseWriter to capture status code and bytes written.
type responseWriter struct {
http.ResponseWriter
status int
bytes int
wrote bool
}
// WriteHeader records the status code and writes it to the underlying ResponseWriter.
func (rw *responseWriter) WriteHeader(code int) {
rw.status = code
rw.wrote = true
rw.ResponseWriter.WriteHeader(code)
}
// Write records the bytes written and writes to the underlying ResponseWriter.
func (rw *responseWriter) Write(b []byte) (int, error) {
n, err := rw.ResponseWriter.Write(b)
rw.bytes += n
return n, err
}
// HSTSMiddleware sets the Strict-Transport-Security response header,
// instructing browsers to refuse plain-HTTP connections to this host
// for the next year (NIST SP 800-52 Rev. 2 § 4.4.6, also DoD STIG
// expectation; OWASP recommendation max-age >= 1 year). Use ONLY when
// zddc-server is itself terminating TLS — when an upstream proxy
// terminates, that proxy should set HSTS instead.
//
// includeSubDomains is set; preload is not (preload requires
// pre-submitting the domain to the browser-vendor list — out of
// scope for this server, and operators who want it can override
// upstream).
//
// max-age = 31536000 = 365 days.
func HSTSMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Strict-Transport-Security", "max-age=31536000; includeSubDomains")
next.ServeHTTP(w, r)
})
}
// AccessLogMiddleware logs a structured line per HTTP request after the
// response is written.
//
// Always emits to slog.Default() (stderr) so server-lifecycle logs and
// access logs share an output stream by default.
//
// If `auditLogger` is non-nil, the same structured fields are also written
// to it. The intended caller wires up auditLogger with a JSON handler
// pointing at a rotating file (see cmd/zddc-server's setupAccessAuditLog),
// so an operator gets a persisted audit trail on disk in addition to the
// stderr stream — useful when stderr is not journald-captured (e.g.
// container logging where the orchestrator drops stderr after restarts).
func AccessLogMiddleware(auditLogger *slog.Logger, next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Capture request start time
start := time.Now()
// Snapshot the as-typed URL path before downstream handlers may
// rewrite it (case-insensitive canonicalization). The audit
// stream records what the client actually sent, not the
// resolved canonical form.
requestedPath := r.URL.Path
// Wrap the ResponseWriter
wrapped := &responseWriter{ResponseWriter: w, status: 200}
// Serve the request
next.ServeHTTP(wrapped, r)
// Calculate duration
durationMs := int(time.Since(start).Milliseconds())
// Get email from context
email := EmailFromContext(r)
if email == "" {
email = "anonymous"
}
args := []any{
"ts", start.Format(time.RFC3339),
"email", email,
"method", r.Method,
"path", requestedPath,
"status", wrapped.status,
"bytes", wrapped.bytes,
"duration_ms", durationMs,
}
if r.URL.Path != requestedPath {
args = append(args, "resolved_path", r.URL.Path)
}
// Stderr stream (existing behavior).
slog.Info("access", args...)
// Audit file (when configured). Same fields, separate handler so
// the file can be JSON-formatted regardless of stderr's handler.
if auditLogger != nil {
auditLogger.Info("access", args...)
}
})
}