fix(client): three bugs found by live smoke testing

Phase 3 + 4 live two-instance smoke tests against the synthetic
~/zddc-test-data fixture surfaced three real bugs that the unit
tests missed. All three are fixed in this commit.

1. walker: filenames with spaces/parens land on disk percent-encoded

   walkSubtree was passing the URL-encoded child URL (built via
   url.PathEscape) to fetchFileIfNeeded → cachePathFor, so a file
   named "Foo (IFI) - Bar.md" landed at <root>/.../Foo%20%28IFI%29
   %20-%20Bar.md on disk. Then purgeOrphans iterated os.ReadDir
   (which sees the encoded names) and compared against upstreamNames
   (decoded names from the listing JSON). Every fetched file was
   classified as an orphan and immediately deleted: a 180-file walk
   produced "fetched=180 purged=111" with only 70 files remaining.

   Fix: walker now maintains two parallel path strings — dirURL
   (URL-encoded for HTTP requests) and dirPath (decoded for disk
   keys). fetchFileIfNeeded, fetchListing, persistOnly, and
   purgeOrphans all take the decoded path. listingCachePathFor
   gets dirPath too. Smoke confirmed: dirs=29 files=180 fetched=179
   purged=0 (one file already cached from the user's GET that
   triggered the walk).

2. outbox: replay loop sleeps 5min after eager startup pass

   RunReplayLoop's idle-poll interval is 5min. After the eager
   startup pass with 0 entries, the loop sleeps 5min — even if a
   PUT-while-offline arrives 1 second later, replay won't fire for
   ~5 min. The cache returned 202 promptly but the queued write sat
   on disk until either a 5min nap elapsed or another PUT happened.

   Fix: Outbox gains a wake chan (buffered=1, drop-on-full).
   Enqueue posts to it after writing meta.json. RunReplayLoop selects
   on wake alongside the timer, so a new offline write triggers an
   immediate replay attempt. Smoke confirmed: PUT queued at T+0,
   master back at T+3, replay completes at T+3 (was previously a
   30s wait through the timer-based poll).

3. master: PUT/DELETE didn't honor If-Unmodified-Since

   The cache's outbox sends If-Unmodified-Since: <cached-mtime> on
   replay so the master can reject conflicting writes with 412. The
   master's checkIfMatch only evaluated If-Match (ETag-based), so
   the cache's mtime-based precondition was silently ignored. Result:
   an offline PUT staged before an external mod would clobber the
   newer external content on replay — silent data loss in the exact
   scenario the outbox is designed to detect.

   Fix: checkIfMatch now also evaluates If-Unmodified-Since per
   RFC 7232 §3.4, returning 412 when the file's current mtime is
   strictly later than the header value (1-second resolution to
   match HTTP-Date precision). Smoke confirmed: cache GET → external
   mod via direct file write → cache offline PUT → master back →
   replay sends IUS → master 412 → outbox entry renamed to
   <id>.conflict-<RFC3339>/ → master content preserved (the
   external mod, not the stale offline write).

Also added an info-level "outbox: replay attempt" log to tryReplay
so an operator watching the cache logs sees the replay loop is
alive even when every entry defers (transport error). Previously
the loop was silent unless a replay actually completed (200) or
conflicted (412).

go vet + go test ./... + go test -race ./internal/{cache,auth,handler}/...
all green. Synthetic ~/zddc-test-data fixture (553 files, 144 PDFs)
exercises the walker against realistic ZDDC filenames including
spaces, parens, and accented characters that the unit tests'
"a.txt" / "b.txt" inputs never hit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
ZDDC 2026-05-08 09:34:07 -05:00
parent e19667b5a2
commit 70d49ba111
3 changed files with 124 additions and 30 deletions

View file

@ -79,6 +79,12 @@ type Outbox struct {
cache *Cache cache *Cache
dir string dir string
mu sync.Mutex // serializes file creation; replay holds it briefly per-entry mu sync.Mutex // serializes file creation; replay holds it briefly per-entry
// wake is a non-blocking signal channel. Enqueue posts to it
// (drop-on-full) so RunReplayLoop reacts to new entries within
// ms, not after its idle-poll interval. Buffered=1 so a burst of
// enqueues collapses to a single wake.
wake chan struct{}
} }
// NewOutbox opens (and creates if missing) the outbox directory under // NewOutbox opens (and creates if missing) the outbox directory under
@ -89,7 +95,7 @@ func NewOutbox(c *Cache) (*Outbox, error) {
return nil, fmt.Errorf("create outbox dir: %w", err) return nil, fmt.Errorf("create outbox dir: %w", err)
} }
_ = os.Chmod(dir, 0o700) _ = os.Chmod(dir, 0o700)
return &Outbox{cache: c, dir: dir}, nil return &Outbox{cache: c, dir: dir, wake: make(chan struct{}, 1)}, nil
} }
// Dir exposes the on-disk path for tests / diagnostics. // Dir exposes the on-disk path for tests / diagnostics.
@ -152,6 +158,13 @@ func (o *Outbox) Enqueue(r *http.Request, baseModTime time.Time) (*OutboxEntry,
_ = os.RemoveAll(entryDir) _ = os.RemoveAll(entryDir)
return nil, err return nil, err
} }
// Signal the replay loop. Non-blocking — buffered=1 collapses
// a burst of enqueues to a single wake. If the loop is busy
// replaying, our entry will be picked up on the next pass.
select {
case o.wake <- struct{}{}:
default:
}
return &entry, nil return &entry, nil
} }
@ -336,6 +349,11 @@ func (o *Outbox) markConflict(id string) error {
// schedule. Stops when ctx is cancelled. Safe to call once per // schedule. Stops when ctx is cancelled. Safe to call once per
// process; not designed for multiple concurrent loops over the same // process; not designed for multiple concurrent loops over the same
// outbox. // outbox.
//
// Wakes on three signals:
// - timer (ReplayInterval when entries pending, ReplayIdleInterval when idle)
// - Enqueue posting to o.wake (instant reaction to new offline writes)
// - context cancel (shutdown)
func (o *Outbox) RunReplayLoop(ctx context.Context) { func (o *Outbox) RunReplayLoop(ctx context.Context) {
// Eagerly attempt replay at startup so a re-launched client // Eagerly attempt replay at startup so a re-launched client
// catches up before any user request fires. // catches up before any user request fires.
@ -351,15 +369,26 @@ func (o *Outbox) RunReplayLoop(ctx context.Context) {
return return
case <-time.After(interval): case <-time.After(interval):
o.tryReplay(ctx) o.tryReplay(ctx)
case <-o.wake:
// New entry arrived while we were idle. Replay
// immediately rather than waiting for the timer.
o.tryReplay(ctx)
} }
} }
} }
// tryReplay is a single replay pass with a bounded timeout. Errors // tryReplay is a single replay pass with a bounded timeout. Errors
// logged at debug; per-entry outcomes logged inside Replay/replayOne. // logged at debug; per-entry outcomes logged inside Replay/replayOne.
// The pass itself logs at info when it runs against >0 entries so an
// operator watching the log sees the replay loop is alive even if
// every attempt defers.
func (o *Outbox) tryReplay(ctx context.Context) { func (o *Outbox) tryReplay(ctx context.Context) {
tCtx, cancel := context.WithTimeout(ctx, 2*time.Minute) tCtx, cancel := context.WithTimeout(ctx, 2*time.Minute)
defer cancel() defer cancel()
pending, _ := o.Pending()
if len(pending) > 0 {
slog.Info("outbox: replay attempt", "pending", len(pending))
}
replayed, conflicts, err := o.Replay(tCtx) replayed, conflicts, err := o.Replay(tCtx)
if err != nil && !errors.Is(err, context.Canceled) && !errors.Is(err, context.DeadlineExceeded) { if err != nil && !errors.Is(err, context.Canceled) && !errors.Is(err, context.DeadlineExceeded) {
slog.Debug("outbox: replay error", "err", err) slog.Debug("outbox: replay error", "err", err)

View file

@ -172,13 +172,23 @@ func (s *MirrorScheduler) walkSubtree(ctx context.Context, subtree string) (walk
var fetchWG sync.WaitGroup var fetchWG sync.WaitGroup
var walkDir func(dirURL string) error // walkDir keeps two parallel views of the directory's path:
walkDir = func(dirURL string) error { // dirURL — URL-encoded form, used to compose HTTP requests
// dirPath — decoded form, used as the filesystem cache key.
// Filenames with spaces / parens / etc. need URL escaping for
// HTTP but must NOT land on disk percent-encoded — otherwise the
// orphan-purge pass below can't match local entries against the
// upstream's (decoded) listing names.
var walkDir func(dirURL, dirPath string) error
walkDir = func(dirURL, dirPath string) error {
if !strings.HasSuffix(dirURL, "/") { if !strings.HasSuffix(dirURL, "/") {
dirURL += "/" dirURL += "/"
} }
if !strings.HasSuffix(dirPath, "/") {
dirPath += "/"
}
addDirs() addDirs()
entries, err := s.fetchListing(ctx, dirURL) entries, err := s.fetchListing(ctx, dirURL, dirPath)
if err != nil { if err != nil {
return err return err
} }
@ -195,10 +205,13 @@ func (s *MirrorScheduler) walkSubtree(ctx context.Context, subtree string) (walk
if ctx.Err() != nil { if ctx.Err() != nil {
return ctx.Err() return ctx.Err()
} }
childURL := dirURL + url.PathEscape(strings.TrimSuffix(e.Name, "/")) name := strings.TrimSuffix(e.Name, "/")
childURL := dirURL + url.PathEscape(name)
childPath := dirPath + name
if e.IsDir { if e.IsDir {
childURL += "/" childURL += "/"
if err := walkDir(childURL); err != nil { childPath += "/"
if err := walkDir(childURL, childPath); err != nil {
slog.Debug("walk subdir failed", "url", childURL, "err", err) slog.Debug("walk subdir failed", "url", childURL, "err", err)
} }
continue continue
@ -206,33 +219,34 @@ func (s *MirrorScheduler) walkSubtree(ctx context.Context, subtree string) (walk
addFiles() addFiles()
fetchWG.Add(1) fetchWG.Add(1)
sem <- struct{}{} sem <- struct{}{}
go func(fileURL string) { go func(fileURL, filePath string) {
defer fetchWG.Done() defer fetchWG.Done()
defer func() { <-sem }() defer func() { <-sem }()
if s.fetchFileIfNeeded(ctx, fileURL) { if s.fetchFileIfNeeded(ctx, fileURL, filePath) {
addFetched() addFetched()
} }
}(childURL) }(childURL, childPath)
} }
// Purge local files no longer present upstream. Only acts on // Purge local files no longer present upstream. Only acts on
// regular files inside this directory — listing sidecars, // regular files inside this directory — listing sidecars,
// dot-prefix internals, and subdirectories are left alone. // dot-prefix internals, and subdirectories are left alone.
s.purgeOrphans(dirURL, upstreamNames, addPurged) s.purgeOrphans(dirPath, upstreamNames, addPurged)
return nil return nil
} }
err := walkDir(subtree + "/") err := walkDir(subtree+"/", subtree+"/")
fetchWG.Wait() fetchWG.Wait()
return stats, err return stats, err
} }
// purgeOrphans removes local cached files in dirURL that are absent // purgeOrphans removes local cached files in dirPath (decoded URL
// from the upstream listing. Only operates on plain files; ignores // path) that are absent from the upstream listing. Only operates on
// dot-prefix entries (cache state) and subdirectories (they get their // plain files; ignores dot-prefix entries (cache state) and
// own pass during walkSubtree's recursion). // subdirectories (they get their own pass during walkSubtree's
func (s *MirrorScheduler) purgeOrphans(dirURL string, upstreamNames map[string]bool, addPurged func()) { // recursion).
rel := filepath.FromSlash(strings.Trim(dirURL, "/")) func (s *MirrorScheduler) purgeOrphans(dirPath string, upstreamNames map[string]bool, addPurged func()) {
rel := filepath.FromSlash(strings.Trim(dirPath, "/"))
localDir := filepath.Join(s.cache.root, rel) localDir := filepath.Join(s.cache.root, rel)
entries, err := os.ReadDir(localDir) entries, err := os.ReadDir(localDir)
if err != nil { if err != nil {
@ -251,14 +265,14 @@ func (s *MirrorScheduler) purgeOrphans(dirURL string, upstreamNames map[string]b
} }
_ = os.Remove(filepath.Join(localDir, name)) _ = os.Remove(filepath.Join(localDir, name))
addPurged() addPurged()
slog.Info("purged orphan", "path", path.Join(dirURL, name)) slog.Info("purged orphan", "path", path.Join(dirPath, name))
} }
} }
// fetchListing GETs the upstream JSON listing for dirURL and parses // fetchListing GETs the upstream JSON listing for dirURL (encoded)
// the entries. Caches the JSON sidecar as a side-effect (so offline // and parses the entries. Caches the JSON sidecar at dirPath
// directory browsing works for walked subtrees). // (decoded) as a side-effect so the directory is browsable offline.
func (s *MirrorScheduler) fetchListing(ctx context.Context, dirURL string) ([]listingEntry, error) { func (s *MirrorScheduler) fetchListing(ctx context.Context, dirURL, dirPath string) ([]listingEntry, error) {
target := s.cache.upstream + dirURL target := s.cache.upstream + dirURL
req, err := http.NewRequestWithContext(ctx, http.MethodGet, target, nil) req, err := http.NewRequestWithContext(ctx, http.MethodGet, target, nil)
if err != nil { if err != nil {
@ -281,7 +295,9 @@ func (s *MirrorScheduler) fetchListing(ctx context.Context, dirURL string) ([]li
return nil, fmt.Errorf("read listing body: %w", err) return nil, fmt.Errorf("read listing body: %w", err)
} }
// Persist the JSON sidecar so the directory is browsable offline. // Persist the JSON sidecar so the directory is browsable offline.
if path, ok := s.cache.listingCachePathFor(dirURL, "application/json"); ok { // listingCachePathFor expects a decoded directory URL (keyed by
// FS path), so dirPath rather than dirURL here.
if path, ok := s.cache.listingCachePathFor(dirPath, "application/json"); ok {
if err := os.MkdirAll(filepath.Dir(path), 0o755); err == nil { if err := os.MkdirAll(filepath.Dir(path), 0o755); err == nil {
tmp, err := os.CreateTemp(filepath.Dir(path), ".zddc-cache-tmp-*") tmp, err := os.CreateTemp(filepath.Dir(path), ".zddc-cache-tmp-*")
if err == nil { if err == nil {
@ -316,11 +332,12 @@ type listingEntry struct {
IsDir bool `json:"is_dir"` IsDir bool `json:"is_dir"`
} }
// fetchFileIfNeeded conditional-GETs a file URL. Returns true when // fetchFileIfNeeded conditional-GETs a file URL. fileURL is the
// the cache was updated (200 + write); false when no work was needed // URL-encoded HTTP target; filePath is the decoded URL path used as
// (304 or skipped). // the on-disk cache key. Returns true when the cache was updated
func (s *MirrorScheduler) fetchFileIfNeeded(ctx context.Context, fileURL string) bool { // (200 + write); false when no work was needed (304 or skipped).
cachePath, ok := s.cache.cachePathFor(fileURL) func (s *MirrorScheduler) fetchFileIfNeeded(ctx context.Context, fileURL, filePath string) bool {
cachePath, ok := s.cache.cachePathFor(filePath)
if !ok { if !ok {
return false return false
} }
@ -353,8 +370,10 @@ func (s *MirrorScheduler) fetchFileIfNeeded(ctx context.Context, fileURL string)
if !s.cache.responseCacheable(resp) { if !s.cache.responseCacheable(resp) {
return false return false
} }
if err := s.cache.persistOnly(resp, fileURL); err != nil { // persistOnly resolves the disk path via cachePathFor too — pass
slog.Debug("walker persist failed", "url", fileURL, "err", err) // the decoded filePath, not the encoded fileURL.
if err := s.cache.persistOnly(resp, filePath); err != nil {
slog.Debug("walker persist failed", "path", filePath, "err", err)
return false return false
} }
s.cache.maybeWriteMarker() s.cache.maybeWriteMarker()

View file

@ -12,6 +12,7 @@ import (
"os" "os"
"path/filepath" "path/filepath"
"strings" "strings"
"time"
"codeberg.org/VARASYS/ZDDC/zddc/internal/config" "codeberg.org/VARASYS/ZDDC/zddc/internal/config"
"codeberg.org/VARASYS/ZDDC/zddc/internal/policy" "codeberg.org/VARASYS/ZDDC/zddc/internal/policy"
@ -239,7 +240,18 @@ func fileETagOnDisk(absPath string) (string, error) {
// exist, the wildcard "*" form fails (per RFC) but a specific ETag is // exist, the wildcard "*" form fails (per RFC) but a specific ETag is
// treated as a no-current-file hit (412). This distinguishes // treated as a no-current-file hit (412). This distinguishes
// create-new from update-existing semantically. // create-new from update-existing semantically.
//
// Also honors If-Unmodified-Since (RFC 7232 §3.4): the request fails
// with 412 if the current file's mtime is strictly later than the
// header value. Used by the cache layer's offline-write outbox to
// detect concurrent modifications without ETag round-trips — the
// cached file's mtime (set from upstream's Last-Modified) becomes the
// base for the precondition. Either header (or both) can be present;
// both must pass.
func checkIfMatch(w http.ResponseWriter, r *http.Request, absPath string) bool { func checkIfMatch(w http.ResponseWriter, r *http.Request, absPath string) bool {
if !checkIfUnmodifiedSince(w, r, absPath) {
return false
}
header := strings.TrimSpace(r.Header.Get("If-Match")) header := strings.TrimSpace(r.Header.Get("If-Match"))
if header == "" { if header == "" {
return true return true
@ -264,6 +276,40 @@ func checkIfMatch(w http.ResponseWriter, r *http.Request, absPath string) bool {
return true return true
} }
// checkIfUnmodifiedSince evaluates the RFC 7232 §3.4 precondition.
// Returns true (pass) when the header is absent or unparseable, when
// the target file does not exist, or when the file's current mtime
// is at or before the header value. Returns false (fail) and writes a
// 412 response when the file has been modified after the header time.
//
// mtime comparison uses the file's mod time truncated to whole
// seconds — HTTP-Date format has 1-second resolution, so a finer
// comparison would spuriously fail on filesystems that retain ns
// precision. "After" therefore means strictly greater than the
// header value at second resolution.
func checkIfUnmodifiedSince(w http.ResponseWriter, r *http.Request, absPath string) bool {
header := strings.TrimSpace(r.Header.Get("If-Unmodified-Since"))
if header == "" {
return true
}
since, err := http.ParseTime(header)
if err != nil {
// Per RFC 7232: if the header value is unparseable, ignore.
return true
}
info, err := os.Stat(absPath)
if err != nil {
// Missing file → no resource to compare against. Pass.
return true
}
current := info.ModTime().Truncate(time.Second)
if current.After(since.Truncate(time.Second)) {
http.Error(w, "Precondition Failed — If-Unmodified-Since: file modified at "+current.UTC().Format(http.TimeFormat), http.StatusPreconditionFailed)
return false
}
return true
}
func serveFilePut(cfg config.Config, w http.ResponseWriter, r *http.Request) { func serveFilePut(cfg config.Config, w http.ResponseWriter, r *http.Request) {
abs, cleanURL, ok, status, msg := resolveTargetPath(cfg, r.URL.Path) abs, cleanURL, ok, status, msg := resolveTargetPath(cfg, r.URL.Path)
if !ok { if !ok {