ZDDC/zddc/internal/handler/archivehandler.go
ZDDC a0f9fca95d feat(archive): canonicalize deep .archive URLs + permissions follow the file
The .archive virtual prefix is now project-scoped at exactly one URL
depth: any /<project>/<sub>/.../.archive/... gets a 301 to the
canonical /<project>/.archive/.... The dispatcher does this before
calling the handler; query strings are preserved (the browser handles
the fragment automatically). .archive is also GET/HEAD-only — anything
else returns 405 with Allow: GET, HEAD, ahead of the file API.

Why: offline-built HTML files reference siblings as
"../.archive/<tracking>.html" from arbitrary depths. All of those refs
should converge on a single stable URL per (project, tracking) so
external links and bookmarks don't fork by entry point.

Permissions now follow the resolved file, not .archive itself.
.archive is a virtual surface — it has no on-disk directory and no
.zddc of its own, so gating it as if it did is wrong. Two gates only:

  - Resolve: only the per-target file's ACL chain decides. A user
    explicitly allowed at one transmittal folder but denied at the
    project root can still fetch tracking numbers that resolve there.
    Per-target denial returns 404 (not 403) so existence doesn't leak.

  - Listing: filter entries by per-target ACL. If the project bucket
    has zero indexed entries → 404 (unknown / empty project, indistinguishable
    from a probe). If the bucket is non-empty but the caller can read
    no entries → 403 (existence-leak guard: don't confirm an inaccessible
    project's archive exists). Otherwise → 200 with the filtered subset.

The listing endpoint is now content-negotiated like ServeDirectory:
Accept: text/html serves the embedded `browse` SPA bytes (with the
embedded ETag and X-ZDDC-Source: embedded:browse); Accept:
application/json returns the JSON entry array (with content-hash ETag
and 304 short-circuit). Vary: Accept set on both. The browse SPA's
auto-detect path-fetch then renders the archive entries as a sortable,
filterable flat list at /<project>/.archive/.

ServeArchive's signature is now (cfg, idx, w, r, project, filename) —
the dispatcher hands the normalized project string in directly, so
projectFromContextPath is gone. Old behavior was to derive project
from contextPath inside the handler; with the upstream redirect that's
redundant and the handler's preconditions are simpler.

Tests: archivehandler_test.go rewritten around the new semantics;
added per-target-only resolve, project-root-deny + per-target-allow
rescue, listing 403/404 distinction, JSON/HTML content-negotiation,
and conditional GET. main_test.go gains TestDispatchArchiveRedirect
(deep paths, query preservation, already-canonical no-op) and
TestDispatchArchiveMethodGate (PUT/POST/DELETE → 405).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 06:28:07 -05:00

205 lines
7.7 KiB
Go

package handler
import (
"encoding/json"
"log/slog"
"net/http"
"path/filepath"
"strings"
"codeberg.org/VARASYS/ZDDC/zddc/internal/apps"
"codeberg.org/VARASYS/ZDDC/zddc/internal/archive"
"codeberg.org/VARASYS/ZDDC/zddc/internal/config"
"codeberg.org/VARASYS/ZDDC/zddc/internal/listing"
"codeberg.org/VARASYS/ZDDC/zddc/internal/policy"
"codeberg.org/VARASYS/ZDDC/zddc/internal/zddc"
)
// ServeArchive handles requests under a project's .archive virtual path.
//
// The dispatcher canonicalizes every .archive request to /<project>/.archive/...
// before reaching here (any deeper /<project>/sub/.../archive/... gets a 301
// to the project-rooted form), so this handler only ever sees one shape:
// project = first URL segment, filename = whatever follows .archive/.
//
// Permissions follow the FILE, not .archive itself. .archive is a virtual
// surface — it has no on-disk directory and no .zddc of its own. Two gates
// only:
//
// 1. Listing: returned entries are filtered by the per-target file's ACL
// chain. If the project bucket is empty (or doesn't exist in the index)
// the response is 404; if the user can read NO entries in a non-empty
// bucket the response is 403, so existence of an inaccessible project's
// archive does not leak.
//
// 2. Resolve: only the per-target file's ACL gates access. A user with
// no project-root permission but an explicit allow on one transmittal
// folder can fetch that file's tracking-number URL; conversely, a user
// with broad project access but a narrower deny on a specific subtree
// gets 404 (not 403) on its tracking numbers — existence must not leak.
//
// Listings serve the embedded `browse` SPA on Accept: text/html and the
// JSON entry array on Accept: application/json — same content negotiation
// as ServeDirectory, so the SPA's auto-detect path-fetch works at .archive
// URLs identically to real directories.
func ServeArchive(cfg config.Config, idx *archive.Index, w http.ResponseWriter, r *http.Request, project, filename string) {
if project == "" {
http.Error(w, "Not Found: .archive must be requested under a project directory (e.g. /<project>/.archive/)", http.StatusNotFound)
return
}
email := EmailFromContext(r)
decider := DeciderFromContext(r)
ctx := r.Context()
if filename == "" {
serveArchiveListing(cfg, idx, w, r, project, email, decider)
return
}
target, ok := archive.Resolve(idx, project, filename)
if !ok {
http.Error(w, "Not Found", http.StatusNotFound)
return
}
// Per-target ACL is the only gate. 404 (not 403) so the tracking
// number's mere existence isn't disclosed to a caller who can't
// actually read the resolved file.
fileDir := filepath.Dir(filepath.Join(cfg.Root, filepath.FromSlash(target)))
chain, err := zddc.EffectivePolicy(cfg.Root, fileDir)
if err != nil {
slog.Warn("ACL policy error on resolved file", "path", fileDir, "err", err)
}
if allowed, _ := policy.AllowFromChain(ctx, decider, chain, email, "/"+target); !allowed {
http.Error(w, "Not Found", http.StatusNotFound)
return
}
// Serve in place — DO NOT redirect to the resolved file's real path.
// People share .archive/<tracking>.html#section URLs and expect the
// link to keep tracking the latest revision; redirecting would pin
// the bookmark to a specific transmittal-folder snapshot. The
// canonicalization redirect (/<project>/<sub>/.archive/X → /<project>/.archive/X)
// happens upstream in the dispatcher and is a different thing — it
// only collapses the .archive prefix, not the resolved bytes.
//
// Cache-Control: no-cache forces conditional revalidation each load —
// http.ServeFile sets Last-Modified/ETag from the on-disk file, so
// when the resolver picks a newer target the ETag changes and the
// browser refetches.
absFile := filepath.Join(cfg.Root, filepath.FromSlash(target))
w.Header().Set("Cache-Control", "no-cache")
http.ServeFile(w, r, absFile)
}
func serveArchiveListing(cfg config.Config, idx *archive.Index, w http.ResponseWriter, r *http.Request, project, email string, decider policy.Decider) {
ctx := r.Context()
allEntries := idx.AllEntries(project)
if len(allEntries) == 0 {
// Project bucket missing or empty. 404 with no body distinction
// from "unknown project" — a caller probing for project names
// gets the same shape whether or not the project exists.
http.Error(w, "Not Found", http.StatusNotFound)
return
}
archiveBase := "/" + project + "/" + cfg.IndexPath + "/"
// ACL chains are folder-keyed and the listing typically hits the same
// few directories repeatedly (one per transmittal folder), so cache
// the allow/deny decision per directory rather than re-walking .zddc
// files for every entry.
aclCache := make(map[string]bool)
allowed := func(targetPath string) bool {
fileDir := filepath.Dir(filepath.Join(cfg.Root, filepath.FromSlash(targetPath)))
if v, ok := aclCache[fileDir]; ok {
return v
}
chain, err := zddc.EffectivePolicy(cfg.Root, fileDir)
if err != nil {
aclCache[fileDir] = false
return false
}
v, _ := policy.AllowFromChain(ctx, decider, chain, email, "/"+targetPath)
aclCache[fileDir] = v
return v
}
result := make([]listing.FileInfo, 0, len(allEntries))
for _, e := range allEntries {
if !allowed(e.TargetPath) {
continue
}
result = append(result, listing.FileInfo{
Name: e.URLName,
URL: archiveBase + e.URLName,
IsDir: false,
})
}
// Existence-leak guard: if the user can read no entries in a
// non-empty bucket, 403 — never confirm the project's archive
// exists to a caller with no permissions in it.
if len(result) == 0 {
http.Error(w, "Forbidden", http.StatusForbidden)
return
}
// Vary: Accept is critical because the same URL serves either the
// JSON listing or the embedded browse SPA depending on Accept;
// without it, browsers/CDNs may serve one Accept's body for the
// other Accept value and break the SPA's JSON auto-fetch.
w.Header().Set("Vary", "Accept")
if strings.Contains(r.Header.Get("Accept"), "application/json") {
body, err := json.Marshal(result)
if err != nil {
slog.Error("encoding archive listing", "err", err)
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
return
}
etag := `"` + listingETag(body) + `"`
w.Header().Set("Content-Type", "application/json")
w.Header().Set("ETag", etag)
w.Header().Set("Cache-Control", "private, max-age=0, must-revalidate")
if match := r.Header.Get("If-None-Match"); match != "" && match == etag {
w.WriteHeader(http.StatusNotModified)
return
}
_, _ = w.Write(body)
return
}
// HTML: serve the embedded `browse` SPA. The SPA auto-detects the
// server-mode listing by re-fetching this same URL with
// Accept: application/json — that path lands in the JSON branch
// above and renders the archive entries as a sortable, filterable
// flat list.
body := apps.EmbeddedBytes("browse")
if len(body) == 0 {
// Bootstrap state: a fresh build hasn't populated browse.html
// into the embed yet. Fall through to JSON for clients that
// will still parse it.
jsonBody, err := json.Marshal(result)
if err != nil {
slog.Error("encoding archive listing (no-embed fallback)", "err", err)
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
w.Header().Set("Cache-Control", "no-cache")
_, _ = w.Write(jsonBody)
return
}
etag := `"` + apps.EmbeddedETag("browse") + `"`
w.Header().Set("ETag", etag)
w.Header().Set("Cache-Control", "public, max-age=0, must-revalidate")
w.Header().Set("Content-Type", "text/html; charset=utf-8")
w.Header().Set("X-ZDDC-Source", "embedded:browse")
if match := r.Header.Get("If-None-Match"); match != "" && match == etag {
w.WriteHeader(http.StatusNotModified)
return
}
_, _ = w.Write(body)
}