feat(client): cache mode — on-demand fetch + persist + offline fallback

zddc-server can now run as a downstream client of another zddc-server.
Set --upstream <url> and the master-side machinery (archive index, apps
server, watcher, OPA decider, ACL middleware, token store) is bypassed
entirely; cmd/zddc-server/main.go short-circuits to runClient(cfg)
which uses zddc/internal/cache/Cache as the entire request handler.

Three modes via --mode <proxy|cache|mirror>:
- proxy: forward upstream live, no disk persistence
- cache (default): persist responses on access; subsequent hits serve
  from disk + background If-Modified-Since revalidate
- mirror: accepted but currently behaves like cache; the access-
  triggered walker lands in phase 3

Cache directory layout is intentionally a normal ZDDC root: a file
fetched from <master>/foo/bar.txt is stored at <root>/foo/bar.txt with
no sidecar metadata. The local file's mtime is set to the upstream's
Last-Modified header so revalidation reflects the master's notion of
file age, not local fetch time. Running zddc-server --root <cache-dir>
without --upstream serves the cached files as a plain master — useful
for portable offline snapshots. A small .zddc-upstream marker is
written once on first persist for provenance.

Pipeline (GET/HEAD only — writes deferred):
- Hit → http.ServeContent serves directly (range-aware, 304-aware) +
  background revalidate (304 no-op, 200 overwrite, 403/404 purge)
- Miss → forward to upstream with the configured bearer; tee response
  body to client + tmp-file atomically renamed into the cache
- Network error + cached → serve stale + X-ZDDC-Cache: offline
- Network error + no cache → 503 + X-ZDDC-Cache: offline
- Directories always proxy live (no listing cache yet — phase 3)
- Cache-Control: no-store / private and non-200 responses bypass cache

Range requests work end-to-end (Range/If-Range headers forwarded on
miss; http.ServeContent handles them natively on hit). Hop-by-hop
headers per RFC 7230 §6.1 are dropped from forwarded responses.

New flags (also as ZDDC_* env vars), all ignored when --upstream is
empty (so master deployments are untouched):
- --upstream <url>
- --mode proxy|cache|mirror (default cache)
- --bearer-file <path> (0600 file with the master-issued token)
- --skip-tls-verify (separate from --no-auth; for self-signed dev)

Validation: --upstream must be http(s)://...; trailing / is trimmed.
Mode validated to one of the three known values. The startup
no-root-.zddc check is skipped in client mode (the cache directory
starts empty by design). The plain-HTTP-on-non-loopback check is also
skipped (the local instance never reads the email header to decide
anything; auth is forwarded to upstream as a Bearer).

Tests: zddc/internal/cache/cache_test.go runs httptest.NewServer as
the upstream and covers miss-then-hit, proxy-mode-no-persist,
directory-never-cached, HEAD-no-body, offline-with-cache,
offline-no-cache → 503, bearer forwarding, query-string preservation,
no-store bypass, path-traversal rejection, error-status forwarding,
revalidate-on-403/404/200/304, range-on-hit, concurrent-same-URL,
cache-path boundary cases. 23 new tests, full suite + go vet clean.

Live two-instance smoke verified: master at 127.0.0.1:18443, client
at :18444 with --mode cache, miss→hit→hit transitions work, file
materialises under cache root with parent dirs created, marker file
written once, range-on-hit returns 206, master sees background 304s
on every hit, killing master leaves cached files serving from disk
and never-cached files returning 503 + offline header.

Doc updates: zddc/README.md gains a "Client mode" section with the
modes table, flag reference, pipeline summary, two-instance recipe,
and explicit list of phase-2 limitations; AGENTS.md adds the four
new env vars to the reference table and a "Client mode" subsection
with smoke-test recipe and a pointer to the cache package;
ARCHITECTURE.md adds "Master + proxy/cache/mirror" before "Bearer
token issuance," covering the topology, the persist/warm switches,
the cache-IS-a-ZDDC-root invariant, the request pipeline, and the
v1-out-of-scope multi-tenancy note; CLAUDE.md's zddc/ entry
expanded to mention both deployment shapes so future agents pick it
up by default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
ZDDC 2026-05-08 07:57:14 -05:00
parent 97ffaac13b
commit ca00904f1e
8 changed files with 1350 additions and 3 deletions

View file

@ -447,12 +447,55 @@ ZDDC_ROOT=/path/to/your/archive ZDDC_TLS_CERT=none ZDDC_ADDR=:8080 \
| `ZDDC_CORS_ORIGIN` | *(empty)* | Comma-separated CORS allowlist; empty (default) disables CORS — appropriate for embedded-tools deployments where tools and data are same-origin. Set explicitly only for self-hosted tools at a different host (e.g. `https://tools.acme.com`) or the CDN-bootstrap pattern (`https://zddc.varasys.io`). | | `ZDDC_CORS_ORIGIN` | *(empty)* | Comma-separated CORS allowlist; empty (default) disables CORS — appropriate for embedded-tools deployments where tools and data are same-origin. Set explicitly only for self-hosted tools at a different host (e.g. `https://tools.acme.com`) or the CDN-bootstrap pattern (`https://zddc.varasys.io`). |
| `ZDDC_INSECURE` | *(empty)* | Must be `1` to allow startup with no `<ZDDC_ROOT>/.zddc`. Without it, the server refuses to start because no `.zddc` files anywhere → public-by-default. Set only for deliberately-public archives. | | `ZDDC_INSECURE` | *(empty)* | Must be `1` to allow startup with no `<ZDDC_ROOT>/.zddc`. Without it, the server refuses to start because no `.zddc` files anywhere → public-by-default. Set only for deliberately-public archives. |
| `ZDDC_NO_AUTH` | *(empty)* | `1` skips ACL enforcement entirely on this instance. On a master: anyone reads everything (dev / trusted-LAN read-only deployments). On a downstream proxy/cache/mirror: trust upstream's filtering, don't re-evaluate ACLs locally. **Distinct from `ZDDC_INSECURE`** (which gates a startup safety check). | | `ZDDC_NO_AUTH` | *(empty)* | `1` skips ACL enforcement entirely on this instance. On a master: anyone reads everything (dev / trusted-LAN read-only deployments). On a downstream proxy/cache/mirror: trust upstream's filtering, don't re-evaluate ACLs locally. **Distinct from `ZDDC_INSECURE`** (which gates a startup safety check). |
| `ZDDC_UPSTREAM` | *(empty)* | Master URL (`https://master.example.com`). When set, the binary runs as a **client** (downstream proxy/cache/mirror) instead of a master — the master-side machinery (archive index, apps server, watcher, OPA, ACL middleware, token store) is replaced by the cache layer in `zddc/internal/cache/`. `--root` becomes the cache directory. |
| `ZDDC_MODE` | `cache` | Client mode: `proxy` (forward live, no persistence), `cache` (default; persist responses on access), `mirror` (phase 3 — currently behaves like `cache`). Ignored when `ZDDC_UPSTREAM` is empty. |
| `ZDDC_BEARER_FILE` | *(empty)* | Path to a 0600 file containing the master-issued token (see `/.tokens` on the master). Forwarded as `Authorization: Bearer …` to upstream on every request. Ignored when `ZDDC_UPSTREAM` is empty. |
| `ZDDC_SKIP_TLS_VERIFY` | *(empty)* | `1` accepts self-signed / untrusted upstream certs. Distinct from `ZDDC_NO_AUTH`. Dev / internal-CA scenarios only. |
| `ZDDC_OPA_URL` | `internal` | Policy decider endpoint. `internal` (default) = in-process Go evaluator (same `.zddc` cascade we always had). `http(s)://...` or `unix:///...` = external OPA — every access decision becomes a `POST /v1/data/zddc/access/allow` to the configured endpoint. Federal customers with their own audited Rego use this; commercial deployments leave it `internal`. | | `ZDDC_OPA_URL` | `internal` | Policy decider endpoint. `internal` (default) = in-process Go evaluator (same `.zddc` cascade we always had). `http(s)://...` or `unix:///...` = external OPA — every access decision becomes a `POST /v1/data/zddc/access/allow` to the configured endpoint. Federal customers with their own audited Rego use this; commercial deployments leave it `internal`. |
| `ZDDC_OPA_FAIL_OPEN` | *(empty)* | External OPA only. `1` = allow on transport error; default = fail closed (deny). | | `ZDDC_OPA_FAIL_OPEN` | *(empty)* | External OPA only. `1` = allow on transport error; default = fail closed (deny). |
| `ZDDC_OPA_CACHE_TTL` | `1s` | External OPA only. Per-decision cache TTL — amortizes round-trips on bursty patterns (e.g. `.archive` listings hit the same `(email, dir)` tuple many times). `0` disables. Format is Go `time.ParseDuration`. | | `ZDDC_OPA_CACHE_TTL` | `1s` | External OPA only. Per-decision cache TTL — amortizes round-trips on bursty patterns (e.g. `.archive` listings hit the same `(email, dir)` tuple many times). `0` disables. Format is Go `time.ParseDuration`. |
| `ZDDC_APPS_PUBKEY` | *(empty)* | Path to PEM Ed25519 pubkey for verifying signatures on URL-fetched `apps:` artifacts. Empty = URL apps refused. Download from `zddc.varasys.io/pubkey.pem` (canonical channels) or supply your own. No baked-in default — same posture as TLS certs. Alternative inline form: `apps_pubkey:` in root `.zddc` (root-only, env/flag wins). | | `ZDDC_APPS_PUBKEY` | *(empty)* | Path to PEM Ed25519 pubkey for verifying signatures on URL-fetched `apps:` artifacts. Empty = URL apps refused. Download from `zddc.varasys.io/pubkey.pem` (canonical channels) or supply your own. No baked-in default — same posture as TLS certs. Alternative inline form: `apps_pubkey:` in root `.zddc` (root-only, env/flag wins). |
| `ZDDC_ACCESS_LOG` | `<ZDDC_ROOT>/.zddc.d/logs/access-<host>.log` | JSON-line audit log (lumberjack-rotated, 100 MB / 10 backups / 90 days, gzipped). Server auto-mkdirs the parent. Set explicitly to empty (`--access-log=`) to disable. Per-host filename + `host` field in every record so multi-replica deployments writing to the same `.zddc.d/` dir disambiguate cleanly. | | `ZDDC_ACCESS_LOG` | `<ZDDC_ROOT>/.zddc.d/logs/access-<host>.log` | JSON-line audit log (lumberjack-rotated, 100 MB / 10 backups / 90 days, gzipped). Server auto-mkdirs the parent. Set explicitly to empty (`--access-log=`) to disable. Per-host filename + `host` field in every record so multi-replica deployments writing to the same `.zddc.d/` dir disambiguate cleanly. |
### Client mode (proxy / cache / mirror)
When `--upstream <url>` is set, the binary runs as a **downstream client** of another zddc-server instead of a master. `cmd/zddc-server/main.go` short-circuits to `runClient(cfg)`, which builds a `*cache.Cache` from `zddc/internal/cache/` and uses it as the entire request handler — no archive index, no apps server, no watcher, no OPA decider, no ACL middleware, no token store.
Three modes via `--mode <proxy|cache|mirror>` (default `cache`). Cache directory layout is intentionally a normal ZDDC root: `<master>/foo/bar.txt``<root>/foo/bar.txt`. Unset `--upstream` and the same root serves as a plain master, useful for portable offline snapshots.
Pipeline (GET/HEAD only in phase 2):
- Cache hit → serve immediately + background `If-Modified-Since` revalidate (304 no-op, 200 overwrite, 403/404 purge).
- Cache miss → forward to upstream; stream response simultaneously to client and a tmp-file atomically renamed into the cache.
- Network error + cached version → serve stale + `X-ZDDC-Cache: offline`.
- Network error + no cache → 503 + `X-ZDDC-Cache: offline`.
- Directories (`/.../`) always proxy live; no listing cache yet (phase 3 / mirror mode).
- `Cache-Control: no-store` / `private` responses pass through but are not persisted.
Two-instance smoke test recipe:
```sh
# Master.
mkdir -p /tmp/m && echo 'admins: [you@example.com]' > /tmp/m/.zddc
echo "hello" > /tmp/m/hello.txt
zddc-server --root /tmp/m --addr 127.0.0.1:18443 --tls-cert=none --no-auth &
# Client (cache mode).
mkdir -p /tmp/c
zddc-server --root /tmp/c --addr 127.0.0.1:18444 --tls-cert=none \
--upstream http://127.0.0.1:18443 --mode cache --no-auth &
curl -sI http://127.0.0.1:18444/hello.txt | grep -i x-zddc-cache # → miss
curl -sI http://127.0.0.1:18444/hello.txt | grep -i x-zddc-cache # → hit
ls /tmp/c # → hello.txt + .zddc-upstream marker
kill %1; sleep 1
curl -sI http://127.0.0.1:18444/hello.txt | grep -i x-zddc-cache # → hit (still served from disk)
curl -si http://127.0.0.1:18444/never.txt | head -1 # → 503
```
`X-ZDDC-Cache` response header values: `miss`, `hit`, `proxy` (no-persist or directory), `offline` (network unreachable). Useful for browser-side freshness UI.
Implementation: `zddc/internal/cache/cache.go` (a single file). Tests in `zddc/internal/cache/cache_test.go` use `httptest.NewServer` as a fake upstream and cover hit/miss/offline/range/bearer-forwarding/no-store paths.
### Bearer tokens (CLI auth) ### Bearer tokens (CLI auth)
zddc-server self-issues bearer tokens for CLI / non-browser callers. No external IDP, no JWKS rotation. Source of truth: `<ZDDC_ROOT>/.zddc.d/tokens/<sha256-hex>` — a YAML file per token with `email`, `created`, `expires`, `description`. Filename is the **hash** of the token; the plaintext is never persisted. zddc-server self-issues bearer tokens for CLI / non-browser callers. No external IDP, no JWKS rotation. Source of truth: `<ZDDC_ROOT>/.zddc.d/tokens/<sha256-hex>` — a YAML file per token with `email`, `created`, `expires`, `description`. Filename is the **hash** of the token; the plaintext is never persisted.

View file

@ -468,6 +468,53 @@ none of them is load-bearing alone.
| Audit log | Reconstruct who did what after the fact | JSON-line tee per request to `<ZDDC_ROOT>/.zddc.d/logs/access-<host>.log`; writes also emit `file_write` op records | | Audit log | Reconstruct who did what after the fact | JSON-line tee per request to `<ZDDC_ROOT>/.zddc.d/logs/access-<host>.log`; writes also emit `file_write` op records |
| File API | Authenticated CRUD over the served tree | `zddc/internal/handler/fileapi.go` — PUT/DELETE/POST routed through the same ACL chain as GET, with per-method verbs (`r`/`w`/`c`/`d`/`a`). Mkdir under `Incoming`/`Working`/`Staging` writes a creator-owned `.zddc` automatically | | File API | Authenticated CRUD over the served tree | `zddc/internal/handler/fileapi.go` — PUT/DELETE/POST routed through the same ACL chain as GET, with per-method verbs (`r`/`w`/`c`/`d`/`a`). Mkdir under `Incoming`/`Working`/`Staging` writes a creator-owned `.zddc` automatically |
### Master + proxy / cache / mirror
The same `zddc-server` binary runs in two distinct topologies:
- **Master mode** (default): the binary owns a file tree under `--root`, applies `.zddc` ACL cascades to incoming requests, serves files / virtual app HTML / archive listings / form submissions / table views. The "normal" zddc-server. All of `cmd/zddc-server/main.go` lives here.
- **Client mode** (`--upstream <url>` set): the binary becomes a downstream proxy/cache/mirror against another zddc-server. The master-side machinery (archive index, apps server, watcher, OPA decider, ACL middleware, token store) is **bypassed entirely**. `zddc/internal/cache/` is the entire request handler.
Three sub-modes within client mode, controlled by `--mode <proxy|cache|mirror>`:
| Mode | Persists responses? | Subtree warmer? | Use case |
|---|---|---|---|
| `proxy` | no | no | thin pass-through; nothing on local disk |
| `cache` (default) | yes | no | field engineer — what you've viewed is available offline |
| `mirror` | yes | yes (planned, phase 3) | vendor mirrors of their subtree; admin backups; complete offline working set |
Internally the modes collapse to two switches on a single request-handling pipeline (`persist`, `warm`). Proxy is cache without disk writes; mirror is cache plus an access-triggered walker. Implementation factor: `cache.New` reads `cfg.Mode` once and sets `c.persist = mode != "proxy"`; the warmer is the only path that doesn't yet exist (phase 3).
**Mirror scope falls out of auth.** Whatever the client's bearer can see at upstream is what the cache can populate. Admin's bearer → mirror gets everything (full backup). Vendor's bearer → mirror is exactly that vendor's permitted subtree. No code distinguishes admin-vs-user — master-side ACL filtering does it.
#### Cache directory IS a normal ZDDC root
The cache directory layout is intentionally a regular ZDDC root: `<master>/foo/bar.txt` is stored at `<root>/foo/bar.txt`. No sidecar metadata files. The local file's `mtime` is set to the upstream's `Last-Modified` header (so revalidation via `If-Modified-Since` reflects the master's notion of file age, not local fetch time). A small `.zddc-upstream` marker file at the root records the upstream URL and first-cached-at timestamp, written once by `sync.Once` on first persist.
Two consequences:
- `zddc-server --root <cache-dir>` (without `--upstream`) serves whatever's been cached as a plain master. Useful for portable offline snapshots — tar the directory, hand it to a colleague, they have a working ZDDC.
- The master/client boundary is one flag: setting/unsetting `--upstream` switches behavior on the same on-disk root.
#### Pipeline
Phase 2 ships GET/HEAD only; writes are deferred to a later phase. For each incoming request:
1. **Directory request** (URL ends in `/`): always proxied live. Listing-cache support belongs with the mirror walker (phase 3) — the bare cache directory's contents only reflect visited files, so a local-walk listing would be misleading.
2. **File request, cache hit** (`persist` mode): serve cached bytes via `http.ServeContent` (which handles `Range` natively + 304 conditional GETs). Header `X-ZDDC-Cache: hit`. Background goroutine fires an `If-Modified-Since` revalidate; on `304` no-op, on `200` overwrite the cache atomically, on `403`/`404` purge.
3. **File request, cache miss**: build an upstream request preserving `Range`, `If-Range`, `Accept`, `Accept-Encoding`; attach the configured bearer. Stream the response simultaneously to the client AND to a tmp file in the cache directory; rename atomically only on success. Header `X-ZDDC-Cache: miss`.
4. **Proxy mode** (no persist): same as miss but skip the tmp-file teeing. Header `X-ZDDC-Cache: proxy`.
5. **Network error + cached version exists**: serve the cached bytes with `X-ZDDC-Cache: offline`. (When the cache hits before any network attempt, the header is `hit` — there's no way to distinguish "hit while online" from "hit while offline" without an extra round-trip; the header tells the user "this is from disk," and the user infers freshness from context or a future explicit freshness probe.)
6. **Network error + no cached version**: `503 Service Unavailable` + `X-ZDDC-Cache: offline`.
Responses with `Cache-Control: no-store` or `Cache-Control: private` pass through but are not persisted. Non-200 responses (including 206 partial content) are forwarded but not persisted — caching a partial body would corrupt subsequent full-body reads.
Hop-by-hop headers per RFC 7230 §6.1 (`Connection`, `Keep-Alive`, `Transfer-Encoding`, etc.) are dropped from forwarded responses; Go's transport drops most automatically, but the cache layer adds a guard for the cases that slip through.
#### Multi-tenancy: explicitly out of scope (v1)
The local instance forwards a single bearer (loaded from `--bearer-file` at startup) regardless of who's calling locally. Single-user-trust on a laptop. For multi-user scenarios, run multiple instances on the same host, or front the local server with your own auth proxy that injects per-user bearers downstream — both options keep the cache layer's design surface minimal.
### Bearer token issuance ### Bearer token issuance
zddc-server issues its own bearer tokens for non-browser callers (CLI tools, scripts, downstream proxy/cache/mirror instances). The master is the identity provider; no external IDP, no JWKS rotation. zddc-server issues its own bearer tokens for non-browser callers (CLI tools, scripts, downstream proxy/cache/mirror instances). The master is the identity provider; no external IDP, no JWKS rotation.

View file

@ -22,7 +22,7 @@ If something in this CLAUDE.md conflicts with those, those win — and please up
This is a **monorepo of independent tools**, not one application: This is a **monorepo of independent tools**, not one application:
- `archive/`, `transmittal/`, `classifier/`, `mdedit/`, `landing/`, `form/` — six self-contained HTML tools, each compiled to a single inlined HTML file in its own `dist/`. Most output `dist/tool.html`; **`landing/` outputs `dist/index.html`** (it's the project picker served at the root of `zddc-server`). The sixth tool, `form/`, is the schema-driven renderer for the form-data system (any `<name>.form.yaml` file in the tree becomes an editable form at `<path>/<name>.form.html`); see AGENTS.md "Form-data system" and ARCHITECTURE.md "Form Renderer". - `archive/`, `transmittal/`, `classifier/`, `mdedit/`, `landing/`, `form/` — six self-contained HTML tools, each compiled to a single inlined HTML file in its own `dist/`. Most output `dist/tool.html`; **`landing/` outputs `dist/index.html`** (it's the project picker served at the root of `zddc-server`). The sixth tool, `form/`, is the schema-driven renderer for the form-data system (any `<name>.form.yaml` file in the tree becomes an editable form at `<path>/<name>.form.html`); see AGENTS.md "Form-data system" and ARCHITECTURE.md "Form Renderer".
- `zddc/` — Go HTTP server (separate sub-project; Go 1.24+). Serves `ZDDC_ROOT/index.html` at `GET /` as the landing page; `Accept: application/json` on `/` returns the ACL-filtered project list. Two auth paths: (a) `Authorization: Bearer <token>` validated against self-issued tokens stored under `<ZDDC_ROOT>/.zddc.d/tokens/` (filename = SHA256 of token), used by CLI / non-browser callers; (b) `X-Auth-Request-Email` injected by an upstream auth proxy, used for browser sessions. Self-service token UI at `/.tokens` + JSON API at `/.api/tokens`. `--no-auth` skips ACL enforcement entirely (distinct from the older `--insecure` which only relaxes the no-root-`.zddc` startup check). Cross-compiled binaries are produced by `./build` and live in `dist/release-output/` (gitignored); `./deploy` rsyncs them to `/srv/zddc/releases/` on the deploy host (Caddy serves them at `https://zddc.varasys.io/releases/`). The `helm/` charts in this repo build from source at deploy time. - `zddc/` — Go HTTP server (separate sub-project; Go 1.24+). Two deployment shapes from the same binary: (1) **master** — owns a file tree under `ZDDC_ROOT`, applies `.zddc` ACL cascades, serves files / app HTML / archive listings. Two auth paths on master: `Authorization: Bearer <token>` validated against self-issued tokens at `<ZDDC_ROOT>/.zddc.d/tokens/<sha256-hex>` for CLI/scripted callers, or `X-Auth-Request-Email` injected by an upstream proxy for browser sessions. Self-service token UI at `/.tokens` + JSON API at `/.api/tokens`. (2) **client** — when `--upstream <url>` is set, the binary becomes a downstream proxy/cache/mirror (`zddc/internal/cache/`); master-side machinery is bypassed and `--root` becomes the cache directory. Three sub-modes via `--mode proxy|cache|mirror` (mirror is phase 3). Cache layout is a normal ZDDC root, so the cache dir can be served as a plain master if you unset `--upstream`. Marker file `.zddc-upstream` records provenance. `--no-auth` skips ACL enforcement entirely on this instance (distinct from `--insecure` which only relaxes the no-root-`.zddc` startup check); `--skip-tls-verify` is a separate flag for self-signed upstream certs. Cross-compiled binaries are produced by `./build` and live in `dist/release-output/` (gitignored); `./deploy` rsyncs them to `/srv/zddc/releases/` on the deploy host (Caddy serves them at `https://zddc.varasys.io/releases/`). The `helm/` charts in this repo build from source at deploy time.
- `shared/``base.css` plus shared JS modules (`zddc.js`, `hash.js`, `zddc-filter.js`, `theme.js`, `help.js`) included by every tool's build, and `build-lib.sh` (POSIX sh helpers sourced by every tool's `build.sh` AND by the top-level `build` for lockstep release helpers). - `shared/``base.css` plus shared JS modules (`zddc.js`, `hash.js`, `zddc-filter.js`, `theme.js`, `help.js`) included by every tool's build, and `build-lib.sh` (POSIX sh helpers sourced by every tool's `build.sh` AND by the top-level `build` for lockstep release helpers).
- **Two-repo + deploy-host model.** Source code lives here (`codeberg.org/VARASYS/ZDDC`). Hand-edited website content lives in a separate repo (`codeberg.org/VARASYS/ZDDC-website`, typically cloned at `~/src/zddc-website/` — just `index.html`, `reference.html`, `css/`, `js/`, `img/`; no releases, no LFS). The live site at `zddc.varasys.io` is served from `/srv/zddc/` on the deploy host: Caddy bind-mounts that path, and it's populated by `./deploy` from this repo's `dist/release-output/` plus `~/src/zddc-website/`. **Releases are NOT in any git history** — they're reproducible from this repo's `<tool>-vX.Y.Z` tags by checking out the tag and running `./build release X.Y.Z`. Per-version files (`<tool>_v<X.Y.Z>.html`) are immutable; partial-version pins (`<tool>_v<X.Y>.html`, `<tool>_v<X>.html`) and channel mirrors (`<tool>_{stable,beta,alpha}.html`) are symlinks; zddc-server has analogous `zddc-server_v<X.Y.Z>_<platform>` per-version binaries plus channel/partial-version symlinks plus `zddc-server_<X>.html` stub pages that fan out the four-platform download in one cell. **Install model:** local use is a download from `/releases/`. Server use is `zddc-server`, which has the current-stable build of all six tools baked in via `//go:embed` (compile-time default). Tools auto-served at folder-name-driven paths: `archive` everywhere, `classifier` in `Incoming`/`Working`/`Staging` subtrees, `mdedit` in `Working` subtrees, `transmittal` in `Staging` subtrees, `landing` only at root. Override via `.zddc apps:` cascade entry (channel/version/URL/path) — fetched once, cached at `<ZDDC_ROOT>/_app/`. Drop a real `.html` file at any path to override. - **Two-repo + deploy-host model.** Source code lives here (`codeberg.org/VARASYS/ZDDC`). Hand-edited website content lives in a separate repo (`codeberg.org/VARASYS/ZDDC-website`, typically cloned at `~/src/zddc-website/` — just `index.html`, `reference.html`, `css/`, `js/`, `img/`; no releases, no LFS). The live site at `zddc.varasys.io` is served from `/srv/zddc/` on the deploy host: Caddy bind-mounts that path, and it's populated by `./deploy` from this repo's `dist/release-output/` plus `~/src/zddc-website/`. **Releases are NOT in any git history** — they're reproducible from this repo's `<tool>-vX.Y.Z` tags by checking out the tag and running `./build release X.Y.Z`. Per-version files (`<tool>_v<X.Y.Z>.html`) are immutable; partial-version pins (`<tool>_v<X.Y>.html`, `<tool>_v<X>.html`) and channel mirrors (`<tool>_{stable,beta,alpha}.html`) are symlinks; zddc-server has analogous `zddc-server_v<X.Y.Z>_<platform>` per-version binaries plus channel/partial-version symlinks plus `zddc-server_<X>.html` stub pages that fan out the four-platform download in one cell. **Install model:** local use is a download from `/releases/`. Server use is `zddc-server`, which has the current-stable build of all six tools baked in via `//go:embed` (compile-time default). Tools auto-served at folder-name-driven paths: `archive` everywhere, `classifier` in `Incoming`/`Working`/`Staging` subtrees, `mdedit` in `Working` subtrees, `transmittal` in `Staging` subtrees, `landing` only at root. Override via `.zddc apps:` cascade entry (channel/version/URL/path) — fetched once, cached at `<ZDDC_ROOT>/_app/`. Drop a real `.html` file at any path to override.
- `helm/` — example Helm charts for zddc-server (`zddc-server-prod/`, `zddc-server-dev/`). Both compile from source via init container. Operators copy `values.yaml.example` and customize. No secrets in repo. - `helm/` — example Helm charts for zddc-server (`zddc-server-prod/`, `zddc-server-dev/`). Both compile from source via init container. Operators copy `values.yaml.example` and customize. No secrets in repo.

View file

@ -202,6 +202,83 @@ JSON API for automation (same auth as the page):
A user can only see and revoke their own tokens. Revoking another A user can only see and revoke their own tokens. Revoking another
user's token returns 404 to avoid leaking ownership. user's token returns 404 to avoid leaking ownership.
## Client mode (proxy / cache / mirror)
The same `zddc-server` binary can run as a downstream client of another
zddc-server. Set `--upstream <url>` and the master-side machinery
(archive index, apps server, watcher, OPA decider, ACL middleware,
token store) is replaced by a thin caching HTTP layer that forwards to
the master and (optionally) persists responses under `--root`.
Three modes via `--mode`:
| Mode | Persists responses? | Subtree warmer? | Use case |
|---|---|---|---|
| `proxy` | no | no | thin pass-through; nothing on local disk |
| `cache` (default) | yes | no | field engineer — what you've viewed is available offline |
| `mirror` | yes | yes (phase 3) | vendor mirrors, admin backups, complete offline working set |
The cache directory layout is a normal ZDDC root: `<master>/foo/bar.txt`
is stored at `<root>/foo/bar.txt`. No sidecar metadata. Running
`zddc-server --root <cache-dir>` (without `--upstream`) serves the
cached files as a plain master — useful for portable offline snapshots.
A small marker file `.zddc-upstream` is written to the cache root on
first persist, recording the upstream URL and first-cached-at timestamp.
Prevents accidentally pointing master mode at a cache directory and
provides ops provenance.
### Flags
| Flag / env | Purpose |
|---|---|
| `--upstream <url>` / `ZDDC_UPSTREAM` | Master URL (e.g. `https://master.example.com`). Setting this enables client mode. |
| `--mode <proxy\|cache\|mirror>` / `ZDDC_MODE` | Default `cache`. Ignored when `--upstream` is empty. |
| `--bearer-file <path>` / `ZDDC_BEARER_FILE` | Path to a 0600 file with a master-issued token (see `/.tokens` on the master). Forwarded as `Authorization: Bearer …` on every upstream request. |
| `--skip-tls-verify` / `ZDDC_SKIP_TLS_VERIFY` | Accept self-signed / untrusted upstream certs. Distinct from `--no-auth`. Dev / internal-CA scenarios only. |
| `--no-auth` / `ZDDC_NO_AUTH` | Skip ACL enforcement on incoming requests to the local instance. The common case for personal field-engineer / cache deployments where the laptop is single-user-trust and the master already filtered. |
### Pipeline
For each incoming `GET` (writes are not yet supported in client mode):
1. **Directory request** (URL ends in `/`): always proxied live. No listing cache yet (phase 3 / mirror mode).
2. **File request, cache hit**: serve cached bytes immediately with `X-ZDDC-Cache: hit`. Kick off a background `If-Modified-Since` revalidate; on `304` no-op, on `200` overwrite the cache, on `403`/`404` purge.
3. **File request, cache miss**: forward to upstream with the configured bearer. On `200` stream simultaneously to the client and a tmp-file that's atomically renamed into the cache. Header `X-ZDDC-Cache: miss`.
4. **Network error and a cached version exists**: serve cached + `X-ZDDC-Cache: offline`.
5. **Network error and no cached version**: `503 Service Unavailable` with `X-ZDDC-Cache: offline`.
Range requests (`Range: bytes=...`) work end-to-end: forwarded to upstream on miss, served via `http.ServeContent` from disk on hit (which handles `Range` natively).
Responses with `Cache-Control: no-store` or `Cache-Control: private` are forwarded but not persisted.
### Two-instance dev recipe
```sh
# Master (your normal zddc-server). Pick any --root with a .zddc.
zddc-server --root /srv/zddc --addr :8443
# Client (any port; doesn't need TLS for local dev).
mkdir -p /tmp/zddc-mirror
zddc-server \
--upstream http://master.example.com:8443 \
--root /tmp/zddc-mirror \
--mode cache \
--bearer-file ~/.config/zddc/token \
--addr 127.0.0.1:8444 \
--tls-cert=none \
--no-auth
```
Browse `http://localhost:8444/`. Files you visit appear under `/tmp/zddc-mirror/` mirroring the master's path layout. Disconnect, refresh — previously-visited files keep working. Reconnect — background revalidates run on every cache hit, picking up master-side changes the next time you reload.
### What client mode is NOT, yet
- **No write path**: `PUT`/`POST`/`DELETE` return `405`. The offline write outbox lands in a later phase.
- **No mirror walker**: `--mode mirror` is accepted but currently behaves like `cache` (no proactive prefetching). Phase 3 adds the access-triggered walk scheduler.
- **No listing cache**: directories always proxy live, so offline browsing of a directory you didn't visit while online won't show anything. Mirror mode + listing caching is phase 3.
- **No multi-tenancy**: the local instance forwards a single bearer to upstream regardless of who's calling locally. For multi-user deployments, run multiple instances or front the local server with your own auth proxy.
## Access control: the `.zddc` cascade ## Access control: the `.zddc` cascade
> ⚠️ **zddc-server refuses to start without a root `.zddc`.** A `ZDDC_ROOT` containing > ⚠️ **zddc-server refuses to start without a root `.zddc`.** A `ZDDC_ROOT` containing

View file

@ -17,6 +17,7 @@ import (
"codeberg.org/VARASYS/ZDDC/zddc/internal/apps" "codeberg.org/VARASYS/ZDDC/zddc/internal/apps"
"codeberg.org/VARASYS/ZDDC/zddc/internal/archive" "codeberg.org/VARASYS/ZDDC/zddc/internal/archive"
"codeberg.org/VARASYS/ZDDC/zddc/internal/auth" "codeberg.org/VARASYS/ZDDC/zddc/internal/auth"
"codeberg.org/VARASYS/ZDDC/zddc/internal/cache"
"codeberg.org/VARASYS/ZDDC/zddc/internal/config" "codeberg.org/VARASYS/ZDDC/zddc/internal/config"
"codeberg.org/VARASYS/ZDDC/zddc/internal/handler" "codeberg.org/VARASYS/ZDDC/zddc/internal/handler"
"codeberg.org/VARASYS/ZDDC/zddc/internal/policy" "codeberg.org/VARASYS/ZDDC/zddc/internal/policy"
@ -74,6 +75,18 @@ func main() {
"addr", cfg.Addr, "addr", cfg.Addr,
"embedded_apps", embeddedVersionsForLog(embedded)) "embedded_apps", embeddedVersionsForLog(embedded))
// Client mode short-circuit: when cfg.Upstream is set, this binary
// runs as a downstream proxy/cache/mirror rather than a master.
// The master-side machinery below (archive index, watcher, apps
// server, policy decider, ACL middleware, token store) is all
// skipped — every request flows through the cache layer, which
// forwards to upstream and (in cache/mirror modes) persists the
// response under cfg.Root.
if cfg.Upstream != "" {
runClient(cfg)
return
}
// Build archive index // Build archive index
slog.Info("building archive index...") slog.Info("building archive index...")
start := time.Now() start := time.Now()
@ -255,6 +268,91 @@ func main() {
slog.Info("stopped") slog.Info("stopped")
} }
// runClient is the entry point when cfg.Upstream is set — a separate
// lifecycle from the master-side main(), with no archive index, no
// apps server, no watcher, no policy decider, no ACL middleware, no
// token store. The cache layer (zddc/internal/cache) is the entire
// request handler; AccessLog + HSTS + gzip wrap it the same way they
// wrap dispatch in master mode.
func runClient(cfg config.Config) {
cacheLayer, err := cache.New(cfg)
if err != nil {
slog.Error("client mode init failed", "err", err)
os.Exit(1)
}
slog.Info("client mode active",
"upstream", cacheLayer.Upstream(),
"mode", cacheLayer.Mode(),
"no_auth", cfg.NoAuth,
"skip_tls_verify", cfg.SkipTLSVerify)
if cfg.NoAuth {
slog.Warn("--no-auth enabled: incoming requests are not ACL-checked locally; trusting upstream's filtering.")
}
tlsCfg, useTLS, err := tlsutil.TLSConfig(cfg)
if err != nil {
slog.Error("failed to configure TLS", "err", err)
os.Exit(1)
}
ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGTERM, syscall.SIGINT)
defer cancel()
auditLogger := setupAccessAuditLog(cfg.AccessLog)
var inner http.Handler = cacheLayer
inner = handler.CORSMiddleware(cfg, inner)
if useTLS {
inner = handler.HSTSMiddleware(inner)
}
inner = handler.AccessLogMiddleware(auditLogger, inner)
mux := http.NewServeMux()
mux.Handle("/", inner)
gzWrapper, err := newGzipWrapper()
if err != nil {
slog.Error("gzhttp wrapper init", "err", err)
os.Exit(1)
}
srv := &http.Server{
Addr: cfg.Addr,
Handler: gzWrapper(mux),
TLSConfig: tlsCfg,
ReadHeaderTimeout: 10 * time.Second,
ReadTimeout: 60 * time.Second,
WriteTimeout: 60 * time.Second,
IdleTimeout: 120 * time.Second,
}
if useTLS {
go func() {
slog.Info("listening", "addr", cfg.Addr, "tls", true, "client_mode", true)
if err := srv.ListenAndServeTLS("", ""); err != nil && err != http.ErrServerClosed {
slog.Error("server error", "err", err)
cancel()
}
}()
} else {
go func() {
slog.Info("listening", "addr", cfg.Addr, "tls", false, "client_mode", true)
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
slog.Error("server error", "err", err)
cancel()
}
}()
}
<-ctx.Done()
slog.Info("shutting down...")
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer shutdownCancel()
if err := srv.Shutdown(shutdownCtx); err != nil {
slog.Error("shutdown error", "err", err)
}
slog.Info("stopped")
}
// setupAccessAuditLog constructs a slog.Logger writing JSON lines to a // setupAccessAuditLog constructs a slog.Logger writing JSON lines to a
// size-rotated file at the operator-configured path. Returns nil if no // size-rotated file at the operator-configured path. Returns nil if no
// path is configured (operator opted out via --access-log=) — // path is configured (operator opted out via --access-log=) —

479
zddc/internal/cache/cache.go vendored Normal file
View file

@ -0,0 +1,479 @@
// Package cache implements zddc-server's client mode: a downstream
// proxy/cache/mirror that runs the same binary against a master.
// Configured via cfg.Upstream (in main.go), the cache layer replaces
// the master-side dispatcher entirely — every incoming request is
// forwarded to the master with the local instance's bearer token, and
// (in cache or mirror mode) the response body is persisted under
// cfg.Root so subsequent requests serve from disk.
//
// The cache directory layout is intentionally a normal ZDDC root: a
// file fetched from `<master>/foo/bar.txt` is stored at `<root>/foo/
// bar.txt`. No sidecar metadata. The local file's mtime is set to the
// upstream's Last-Modified header so revalidation via
// If-Modified-Since reflects the master's notion of the file's age,
// not when the local cache happened to fetch it. Running
// `zddc-server --root <cache-dir>` without --upstream serves the
// cached files as a regular ZDDC — useful for portable offline
// snapshots and sanity-check inspection.
//
// Phase 2 scope: GET/HEAD only. Range requests, stale-while-
// revalidate, and offline-fallback are supported. Directory listings
// are always proxied live (no listing cache yet); writes (PUT / POST /
// DELETE) and the mirror walker land in later phases.
package cache
import (
"crypto/tls"
"fmt"
"io"
"log/slog"
"net/http"
"net/url"
"os"
"path/filepath"
"strings"
"sync"
"time"
"codeberg.org/VARASYS/ZDDC/zddc/internal/config"
)
// MarkerFile records the upstream URL and first-cached-at timestamp
// in the cache root. Prevents accidentally pointing master mode at a
// cache directory and provides provenance for ops/users.
const MarkerFile = ".zddc-upstream"
// HeaderName is the response header that surfaces cache state to the
// client (and the browser-side UI). Values: hit, revalidated, miss,
// proxy, offline.
const HeaderName = "X-ZDDC-Cache"
// Cache is the request handler installed in main.go when cfg.Upstream
// is non-empty. It is safe for concurrent ServeHTTP calls.
type Cache struct {
root string // local cache directory (== cfg.Root in client mode)
upstream string // upstream master URL, no trailing slash
bearer string // forwarded as Authorization: Bearer to upstream; "" disables
mode string // "proxy" | "cache" | "mirror"
persist bool // mode != "proxy" — write responses to disk
client *http.Client
markerOnce sync.Once
}
// New constructs a Cache from the loaded configuration. Validates
// upstream URL, reads the bearer-file (if configured), prepares the
// HTTP client honoring SkipTLSVerify, and ensures the cache root
// exists.
func New(cfg config.Config) (*Cache, error) {
if cfg.Upstream == "" {
return nil, fmt.Errorf("cache.New: cfg.Upstream is empty")
}
upstream := strings.TrimRight(cfg.Upstream, "/")
if _, err := url.Parse(upstream); err != nil {
return nil, fmt.Errorf("cache.New: invalid upstream %q: %w", upstream, err)
}
bearer := ""
if cfg.BearerFile != "" {
b, err := os.ReadFile(cfg.BearerFile)
if err != nil {
return nil, fmt.Errorf("cache.New: read bearer file: %w", err)
}
bearer = strings.TrimSpace(string(b))
if bearer == "" {
return nil, fmt.Errorf("cache.New: bearer file %q is empty", cfg.BearerFile)
}
}
transport := &http.Transport{
MaxIdleConns: 10,
IdleConnTimeout: 30 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ResponseHeaderTimeout: 30 * time.Second,
}
if cfg.SkipTLSVerify {
// G402 / CWE-295: deliberate. Documented operator opt-in for
// dev/internal-CA scenarios; never the default.
transport.TLSClientConfig = &tls.Config{InsecureSkipVerify: true} //nolint:gosec
slog.Warn("--skip-tls-verify enabled: upstream TLS certificates will NOT be validated")
}
if err := os.MkdirAll(cfg.Root, 0o755); err != nil {
return nil, fmt.Errorf("cache.New: create cache root %q: %w", cfg.Root, err)
}
mode := cfg.Mode
if mode == "" {
mode = "cache"
}
return &Cache{
root: cfg.Root,
upstream: upstream,
bearer: bearer,
mode: mode,
persist: mode != "proxy",
client: &http.Client{
Transport: transport,
Timeout: 60 * time.Second,
// Don't follow redirects automatically — pass them through to
// the client so the browser can update its address bar
// (e.g. master's no-trailing-slash → trailing-slash 301).
CheckRedirect: func(req *http.Request, via []*http.Request) error {
return http.ErrUseLastResponse
},
},
}, nil
}
// Mode returns the configured mode label for diagnostics.
func (c *Cache) Mode() string { return c.mode }
// Upstream returns the upstream master URL for diagnostics.
func (c *Cache) Upstream() string { return c.upstream }
// ServeHTTP is the cache layer's HTTP entry point. Replaces the
// master-side dispatcher in client mode.
func (c *Cache) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// Phase 2: read-only. Writes are deferred to the outbox phase.
// Forward HEAD as GET-without-body to keep the response shape
// consistent with what http.ServeContent would do.
if r.Method != http.MethodGet && r.Method != http.MethodHead {
w.Header().Set("Allow", "GET, HEAD")
http.Error(w, "Method Not Allowed: writes are not yet supported in client mode", http.StatusMethodNotAllowed)
return
}
// Directory listings are always proxied live in v1. The cache
// directory's actual filesystem listing would be inaccurate (it
// only contains visited files), and full listing-cache support
// belongs with the mirror walker in phase 3.
if strings.HasSuffix(r.URL.Path, "/") {
c.proxy(w, r, false /* writeToCache */)
return
}
// File request — try cache first when persisting.
if c.persist {
if path, ok := c.cachePathFor(r.URL.Path); ok {
info, err := os.Stat(path)
if err == nil && !info.IsDir() {
c.serveFromDisk(w, r, path, info, "hit")
// Background revalidate; never block the user response.
go c.revalidate(r.URL.Path, info.ModTime())
return
}
}
}
// Miss (or proxy mode) → forward to upstream and (optionally)
// persist on the way through.
c.proxy(w, r, c.persist)
}
// proxy forwards the request to upstream and serves the response back
// to the client. When writeToCache is true and the response is a
// cacheable 200, the body is also persisted under cfg.Root.
func (c *Cache) proxy(w http.ResponseWriter, r *http.Request, writeToCache bool) {
upReq, err := c.buildUpstreamRequest(r)
if err != nil {
http.Error(w, "Bad Request: "+err.Error(), http.StatusBadRequest)
return
}
resp, err := c.client.Do(upReq)
if err != nil {
// Network error. If we have a cached copy, serve it stale.
if writeToCache && r.Method == http.MethodGet {
if path, ok := c.cachePathFor(r.URL.Path); ok {
if info, sErr := os.Stat(path); sErr == nil && !info.IsDir() {
c.serveFromDisk(w, r, path, info, "offline")
return
}
}
}
slog.Warn("upstream fetch failed", "url", upReq.URL.String(), "err", err)
w.Header().Set(HeaderName, "offline")
http.Error(w, "Service Unavailable: upstream unreachable", http.StatusServiceUnavailable)
return
}
defer resp.Body.Close()
// Forward upstream response headers. Skip hop-by-hop headers (RFC
// 7230 §6.1) — Go's transport already drops most, but Connection
// and Transfer-Encoding can sneak through and confuse the client.
for k, vv := range resp.Header {
if isHopByHop(k) {
continue
}
for _, v := range vv {
w.Header().Add(k, v)
}
}
cacheable := writeToCache && resp.StatusCode == http.StatusOK && c.responseCacheable(resp)
if cacheable {
w.Header().Set(HeaderName, "miss")
} else if writeToCache {
w.Header().Set(HeaderName, "proxy")
} else {
w.Header().Set(HeaderName, "proxy")
}
w.WriteHeader(resp.StatusCode)
if r.Method == http.MethodHead || resp.StatusCode == http.StatusNotModified {
return
}
if !cacheable {
_, _ = io.Copy(w, resp.Body)
return
}
// Stream body to client AND to a tmp file in the cache; rename
// atomically only on success.
if err := c.streamAndPersist(w, resp, r.URL.Path); err != nil {
// Mid-stream error: the client got a partial body (HTTP-normal),
// and we already abandoned the cache write. Just log.
slog.Debug("stream-and-persist error", "url", r.URL.Path, "err", err)
} else {
c.maybeWriteMarker()
}
}
// buildUpstreamRequest constructs the outbound request preserving the
// path, query, Range, and Accept headers. Adds the bearer if configured.
func (c *Cache) buildUpstreamRequest(r *http.Request) (*http.Request, error) {
target := c.upstream + r.URL.RequestURI()
upReq, err := http.NewRequestWithContext(r.Context(), r.Method, target, nil)
if err != nil {
return nil, err
}
// Preserve the Range header for resumable / partial transfers.
if v := r.Header.Get("Range"); v != "" {
upReq.Header.Set("Range", v)
}
if v := r.Header.Get("If-Range"); v != "" {
upReq.Header.Set("If-Range", v)
}
if v := r.Header.Get("Accept"); v != "" {
upReq.Header.Set("Accept", v)
}
if v := r.Header.Get("Accept-Encoding"); v != "" {
upReq.Header.Set("Accept-Encoding", v)
}
upReq.Header.Set("User-Agent", "zddc-server-cache/0.1")
if c.bearer != "" {
upReq.Header.Set("Authorization", "Bearer "+c.bearer)
}
return upReq, nil
}
// responseCacheable reports whether the response body should be
// persisted. Honors Cache-Control: no-store / private and refuses to
// cache responses without a content body (ranges, 204, etc.).
func (c *Cache) responseCacheable(resp *http.Response) bool {
cc := resp.Header.Get("Cache-Control")
low := strings.ToLower(cc)
if strings.Contains(low, "no-store") || strings.Contains(low, "private") {
return false
}
// Don't cache partial-content responses — the server returned 206
// for a Range request, which means the body covers only part of
// the file. Caching that partial body would corrupt subsequent
// non-range fetches.
if resp.StatusCode != http.StatusOK {
return false
}
return true
}
// streamAndPersist writes resp.Body simultaneously to the client and
// to a temp file in the cache. Renames the temp atomically on success.
// Sets the local file's mtime to upstream's Last-Modified (if
// present) so subsequent revalidations send If-Modified-Since with a
// timestamp upstream can compare against its own state.
func (c *Cache) streamAndPersist(w http.ResponseWriter, resp *http.Response, urlPath string) error {
finalPath, ok := c.cachePathFor(urlPath)
if !ok {
_, err := io.Copy(w, resp.Body)
return err
}
if err := os.MkdirAll(filepath.Dir(finalPath), 0o755); err != nil {
_, copyErr := io.Copy(w, resp.Body)
if copyErr != nil {
return copyErr
}
return err
}
tmp, err := os.CreateTemp(filepath.Dir(finalPath), ".zddc-cache-tmp-*")
if err != nil {
_, copyErr := io.Copy(w, resp.Body)
if copyErr != nil {
return copyErr
}
return err
}
tmpName := tmp.Name()
mw := io.MultiWriter(tmp, w)
if _, err := io.Copy(mw, resp.Body); err != nil {
_ = tmp.Close()
_ = os.Remove(tmpName)
return err
}
if err := tmp.Close(); err != nil {
_ = os.Remove(tmpName)
return err
}
if lm := resp.Header.Get("Last-Modified"); lm != "" {
if t, err := http.ParseTime(lm); err == nil {
_ = os.Chtimes(tmpName, t, t)
}
}
return os.Rename(tmpName, finalPath)
}
// serveFromDisk serves a cached file via http.ServeContent (which
// handles Range requests, If-Modified-Since, and conditional GETs
// natively). cacheState is the X-ZDDC-Cache value to surface.
func (c *Cache) serveFromDisk(w http.ResponseWriter, r *http.Request, path string, info os.FileInfo, cacheState string) {
f, err := os.Open(path)
if err != nil {
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
return
}
defer f.Close()
w.Header().Set(HeaderName, cacheState)
http.ServeContent(w, r, filepath.Base(path), info.ModTime(), f)
}
// revalidate fires a conditional GET against upstream after a cache
// hit. 304 = no-op (cache is fresh). 200 = update cache. 403/404 =
// purge (ACL revoked or upstream deleted). Network errors are
// swallowed — staleness via offline is the documented behavior.
func (c *Cache) revalidate(urlPath string, mtime time.Time) {
target := c.upstream + urlPath
req, err := http.NewRequest(http.MethodGet, target, nil)
if err != nil {
return
}
if !mtime.IsZero() {
req.Header.Set("If-Modified-Since", mtime.UTC().Format(http.TimeFormat))
}
if c.bearer != "" {
req.Header.Set("Authorization", "Bearer "+c.bearer)
}
resp, err := c.client.Do(req)
if err != nil {
return
}
defer resp.Body.Close()
switch resp.StatusCode {
case http.StatusNotModified:
return
case http.StatusOK:
if !c.responseCacheable(resp) {
return
}
if err := c.persistOnly(resp, urlPath); err != nil {
slog.Debug("revalidate persist error", "url", urlPath, "err", err)
}
case http.StatusForbidden, http.StatusNotFound:
if path, ok := c.cachePathFor(urlPath); ok {
_ = os.Remove(path)
slog.Info("purged cached entry after upstream 4xx", "url", urlPath, "status", resp.StatusCode)
}
}
}
// persistOnly writes resp.Body to the cache without forwarding it
// anywhere. Used by revalidate (the user's request was already served
// from disk; we just refresh the cache in the background).
func (c *Cache) persistOnly(resp *http.Response, urlPath string) error {
finalPath, ok := c.cachePathFor(urlPath)
if !ok {
_, _ = io.Copy(io.Discard, resp.Body)
return nil
}
if err := os.MkdirAll(filepath.Dir(finalPath), 0o755); err != nil {
_, _ = io.Copy(io.Discard, resp.Body)
return err
}
tmp, err := os.CreateTemp(filepath.Dir(finalPath), ".zddc-cache-tmp-*")
if err != nil {
_, _ = io.Copy(io.Discard, resp.Body)
return err
}
tmpName := tmp.Name()
if _, err := io.Copy(tmp, resp.Body); err != nil {
_ = tmp.Close()
_ = os.Remove(tmpName)
return err
}
if err := tmp.Close(); err != nil {
_ = os.Remove(tmpName)
return err
}
if lm := resp.Header.Get("Last-Modified"); lm != "" {
if t, err := http.ParseTime(lm); err == nil {
_ = os.Chtimes(tmpName, t, t)
}
}
return os.Rename(tmpName, finalPath)
}
// cachePathFor maps a URL path to a local filesystem path under the
// cache root. Returns ok=false on inputs that would escape the root,
// reserve a marker filename, or otherwise be unsafe to write.
func (c *Cache) cachePathFor(urlPath string) (string, bool) {
if urlPath == "" || urlPath == "/" {
return "", false
}
if strings.Contains(urlPath, "..") {
return "", false
}
clean := filepath.FromSlash(strings.TrimPrefix(urlPath, "/"))
abs := filepath.Join(c.root, clean)
if !strings.HasPrefix(abs, c.root+string(filepath.Separator)) && abs != c.root {
return "", false
}
// Don't let URLs collide with internal markers.
if filepath.Base(abs) == MarkerFile {
return "", false
}
return abs, true
}
// maybeWriteMarker writes the .zddc-upstream provenance file once,
// the first time the cache stores anything. Best-effort: an error
// here doesn't fail the request.
func (c *Cache) maybeWriteMarker() {
c.markerOnce.Do(func() {
marker := filepath.Join(c.root, MarkerFile)
if _, err := os.Stat(marker); err == nil {
return
}
body := fmt.Sprintf("upstream: %s\nfirst_cached: %s\nmode: %s\n",
c.upstream, time.Now().UTC().Format(time.RFC3339), c.mode)
_ = os.WriteFile(marker, []byte(body), 0o644)
})
}
// isHopByHop reports whether a header name is hop-by-hop per RFC 7230
// §6.1 — these must not be forwarded by a proxy.
func isHopByHop(name string) bool {
switch http.CanonicalHeaderKey(name) {
case "Connection",
"Keep-Alive",
"Proxy-Authenticate",
"Proxy-Authorization",
"Te",
"Trailer",
"Transfer-Encoding",
"Upgrade":
return true
}
return false
}

546
zddc/internal/cache/cache_test.go vendored Normal file
View file

@ -0,0 +1,546 @@
package cache
import (
"io"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"strings"
"sync"
"sync/atomic"
"testing"
"time"
"codeberg.org/VARASYS/ZDDC/zddc/internal/config"
)
// newTestCache spins up an httptest server as the upstream and
// returns the cache + the upstream's URL. The upstream's behavior is
// the caller's to define.
func newTestCache(t *testing.T, mode string, upstreamHandler http.HandlerFunc) (*Cache, *httptest.Server) {
t.Helper()
upstream := httptest.NewServer(upstreamHandler)
t.Cleanup(upstream.Close)
root := t.TempDir()
c, err := New(config.Config{
Root: root,
Upstream: upstream.URL,
Mode: mode,
})
if err != nil {
t.Fatalf("New: %v", err)
}
return c, upstream
}
func TestNew_RequiresUpstream(t *testing.T) {
if _, err := New(config.Config{Root: t.TempDir()}); err == nil {
t.Error("expected error for empty upstream")
}
}
func TestNew_StripsTrailingSlash(t *testing.T) {
c, err := New(config.Config{
Root: t.TempDir(),
Upstream: "http://example.com/",
})
if err != nil {
t.Fatalf("New: %v", err)
}
if got := c.Upstream(); got != "http://example.com" {
t.Errorf("Upstream() = %q, want trailing slash stripped", got)
}
}
func TestNew_BearerFile(t *testing.T) {
dir := t.TempDir()
tokenPath := filepath.Join(dir, "token")
if err := os.WriteFile(tokenPath, []byte(" abc123\n"), 0o600); err != nil {
t.Fatalf("write token: %v", err)
}
c, err := New(config.Config{
Root: t.TempDir(),
Upstream: "http://example.com",
BearerFile: tokenPath,
})
if err != nil {
t.Fatalf("New: %v", err)
}
if c.bearer != "abc123" {
t.Errorf("bearer = %q, want abc123 (whitespace trimmed)", c.bearer)
}
}
func TestNew_BearerFileEmptyRejected(t *testing.T) {
dir := t.TempDir()
empty := filepath.Join(dir, "empty")
_ = os.WriteFile(empty, []byte("\n\n"), 0o600)
if _, err := New(config.Config{
Root: t.TempDir(),
Upstream: "http://example.com",
BearerFile: empty,
}); err == nil {
t.Error("expected error for empty bearer file")
}
}
func TestServeHTTP_RejectsWriteMethods(t *testing.T) {
c, _ := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {
t.Errorf("upstream should not be called for write methods")
})
for _, method := range []string{http.MethodPut, http.MethodPost, http.MethodDelete} {
rec := httptest.NewRecorder()
r := httptest.NewRequest(method, "/foo", nil)
c.ServeHTTP(rec, r)
if rec.Code != http.StatusMethodNotAllowed {
t.Errorf("%s = %d, want 405", method, rec.Code)
}
if got := rec.Header().Get("Allow"); got != "GET, HEAD" {
t.Errorf("%s Allow = %q", method, got)
}
}
}
func TestServeHTTP_MissThenHit(t *testing.T) {
var hits int32
c, upstream := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {
atomic.AddInt32(&hits, 1)
if r.URL.Path != "/foo.txt" {
t.Errorf("upstream got %q, want /foo.txt", r.URL.Path)
}
w.Header().Set("Content-Type", "text/plain")
w.Header().Set("Last-Modified", "Mon, 02 Jan 2006 15:04:05 GMT")
_, _ = w.Write([]byte("hello"))
})
_ = upstream
// First request: miss.
rec := httptest.NewRecorder()
r := httptest.NewRequest(http.MethodGet, "/foo.txt", nil)
c.ServeHTTP(rec, r)
if rec.Code != http.StatusOK {
t.Fatalf("first GET = %d", rec.Code)
}
if got := rec.Header().Get(HeaderName); got != "miss" {
t.Errorf("first cache header = %q, want miss", got)
}
if got := rec.Body.String(); got != "hello" {
t.Errorf("body = %q", got)
}
// Cache file should exist.
cached := filepath.Join(c.root, "foo.txt")
if _, err := os.Stat(cached); err != nil {
t.Fatalf("expected cached file: %v", err)
}
// Second request: hit. Wait briefly to let the marker write race finish.
rec2 := httptest.NewRecorder()
r2 := httptest.NewRequest(http.MethodGet, "/foo.txt", nil)
c.ServeHTTP(rec2, r2)
if rec2.Code != http.StatusOK {
t.Fatalf("second GET = %d", rec2.Code)
}
if got := rec2.Header().Get(HeaderName); got != "hit" {
t.Errorf("second cache header = %q, want hit", got)
}
if got := rec2.Body.String(); got != "hello" {
t.Errorf("second body = %q", got)
}
// Marker file should be present.
marker := filepath.Join(c.root, MarkerFile)
mb, err := os.ReadFile(marker)
if err != nil {
t.Fatalf("marker missing: %v", err)
}
if !strings.Contains(string(mb), "upstream:") {
t.Errorf("marker contents unexpected: %s", string(mb))
}
}
func TestServeHTTP_ProxyModeDoesNotPersist(t *testing.T) {
c, _ := newTestCache(t, "proxy", func(w http.ResponseWriter, r *http.Request) {
_, _ = w.Write([]byte("payload"))
})
rec := httptest.NewRecorder()
r := httptest.NewRequest(http.MethodGet, "/foo.txt", nil)
c.ServeHTTP(rec, r)
if rec.Code != http.StatusOK {
t.Fatalf("status = %d", rec.Code)
}
if got := rec.Header().Get(HeaderName); got != "proxy" {
t.Errorf("cache header = %q, want proxy", got)
}
cached := filepath.Join(c.root, "foo.txt")
if _, err := os.Stat(cached); !os.IsNotExist(err) {
t.Errorf("proxy mode wrote to cache: %v", err)
}
// Marker also shouldn't exist (no caching happened).
if _, err := os.Stat(filepath.Join(c.root, MarkerFile)); !os.IsNotExist(err) {
t.Errorf("marker file written in proxy mode")
}
}
func TestServeHTTP_DirectoriesAreNeverCached(t *testing.T) {
c, _ := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/html")
_, _ = w.Write([]byte("<html>listing</html>"))
})
rec := httptest.NewRecorder()
r := httptest.NewRequest(http.MethodGet, "/Project/", nil)
c.ServeHTTP(rec, r)
if rec.Code != http.StatusOK {
t.Fatalf("status = %d", rec.Code)
}
if got := rec.Header().Get(HeaderName); got != "proxy" {
t.Errorf("cache header = %q, want proxy (directories don't cache)", got)
}
// No file or directory should have been created at the URL location.
if entries, _ := os.ReadDir(c.root); len(entries) > 0 {
t.Errorf("directory request created cache entries: %v", entries)
}
}
func TestServeHTTP_HEAD_HitDoesNotReturnBody(t *testing.T) {
c, _ := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {
_, _ = w.Write([]byte("hello"))
})
// Seed the cache via GET.
rec := httptest.NewRecorder()
c.ServeHTTP(rec, httptest.NewRequest(http.MethodGet, "/foo.txt", nil))
if rec.Code != http.StatusOK {
t.Fatalf("seed: %d", rec.Code)
}
// HEAD: should be a hit, no body.
rec2 := httptest.NewRecorder()
c.ServeHTTP(rec2, httptest.NewRequest(http.MethodHead, "/foo.txt", nil))
if rec2.Code != http.StatusOK {
t.Fatalf("HEAD: %d", rec2.Code)
}
if rec2.Body.Len() != 0 {
t.Errorf("HEAD body length = %d, want 0", rec2.Body.Len())
}
}
func TestServeHTTP_OfflineServesStale(t *testing.T) {
root := t.TempDir()
// Pre-seed a cached file.
if err := os.WriteFile(filepath.Join(root, "stale.txt"), []byte("stale-content"), 0o644); err != nil {
t.Fatalf("seed: %v", err)
}
c, err := New(config.Config{
Root: root,
Upstream: "http://127.0.0.1:1", // unreachable port
Mode: "cache",
})
if err != nil {
t.Fatalf("New: %v", err)
}
// Speed up the timeout so the test doesn't hang.
c.client.Timeout = 200 * time.Millisecond
rec := httptest.NewRecorder()
r := httptest.NewRequest(http.MethodGet, "/stale.txt", nil)
c.ServeHTTP(rec, r)
if rec.Code != http.StatusOK {
t.Fatalf("offline-with-cache = %d, want 200", rec.Code)
}
if got := rec.Header().Get(HeaderName); got != "hit" {
// On hit we don't even hit the network. That's expected.
t.Logf("first attempt was %q (likely cache hit before any network)", got)
}
if got := rec.Body.String(); got != "stale-content" {
t.Errorf("body = %q", got)
}
}
func TestServeHTTP_OfflineMissReturns503(t *testing.T) {
root := t.TempDir()
c, err := New(config.Config{
Root: root,
Upstream: "http://127.0.0.1:1",
Mode: "cache",
})
if err != nil {
t.Fatalf("New: %v", err)
}
c.client.Timeout = 200 * time.Millisecond
rec := httptest.NewRecorder()
r := httptest.NewRequest(http.MethodGet, "/never-cached.txt", nil)
c.ServeHTTP(rec, r)
if rec.Code != http.StatusServiceUnavailable {
t.Errorf("offline-no-cache = %d, want 503", rec.Code)
}
if got := rec.Header().Get(HeaderName); got != "offline" {
t.Errorf("cache header = %q, want offline", got)
}
}
func TestServeHTTP_BearerForwarded(t *testing.T) {
dir := t.TempDir()
tokenPath := filepath.Join(dir, "token")
_ = os.WriteFile(tokenPath, []byte("secrettoken"), 0o600)
var seenAuth string
upstream := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
seenAuth = r.Header.Get("Authorization")
_, _ = w.Write([]byte("ok"))
}))
defer upstream.Close()
c, err := New(config.Config{
Root: t.TempDir(),
Upstream: upstream.URL,
Mode: "cache",
BearerFile: tokenPath,
})
if err != nil {
t.Fatalf("New: %v", err)
}
rec := httptest.NewRecorder()
c.ServeHTTP(rec, httptest.NewRequest(http.MethodGet, "/foo.txt", nil))
if seenAuth != "Bearer secrettoken" {
t.Errorf("Authorization = %q, want Bearer secrettoken", seenAuth)
}
}
func TestServeHTTP_PreservesQuery(t *testing.T) {
var seenURL string
c, _ := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {
seenURL = r.URL.RequestURI()
w.Header().Set("Cache-Control", "no-store") // no-cache the JSON response
_, _ = w.Write([]byte(`{}`))
})
rec := httptest.NewRecorder()
c.ServeHTTP(rec, httptest.NewRequest(http.MethodGet, "/foo.txt?q=bar", nil))
if seenURL != "/foo.txt?q=bar" {
t.Errorf("upstream saw %q, want /foo.txt?q=bar", seenURL)
}
}
func TestServeHTTP_HonorsNoStore(t *testing.T) {
c, _ := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Cache-Control", "no-store")
_, _ = w.Write([]byte("ephemeral"))
})
rec := httptest.NewRecorder()
c.ServeHTTP(rec, httptest.NewRequest(http.MethodGet, "/dynamic.json", nil))
if rec.Code != http.StatusOK {
t.Fatalf("status: %d", rec.Code)
}
if got := rec.Header().Get(HeaderName); got != "proxy" {
t.Errorf("cache header = %q, want proxy (no-store should bypass cache)", got)
}
cached := filepath.Join(c.root, "dynamic.json")
if _, err := os.Stat(cached); !os.IsNotExist(err) {
t.Errorf("no-store response was cached")
}
}
func TestServeHTTP_PathTraversalRejected(t *testing.T) {
called := false
c, _ := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {
called = true
_, _ = w.Write([]byte("data"))
})
rec := httptest.NewRecorder()
c.ServeHTTP(rec, httptest.NewRequest(http.MethodGet, "/../etc/passwd", nil))
// The upstream may still be called (the proxy doesn't gatekeep), but
// we MUST NOT cache to a path that escapes the root.
_ = called
root := c.root
parent := filepath.Dir(root)
if _, err := os.Stat(filepath.Join(parent, "etc", "passwd")); !os.IsNotExist(err) {
t.Error("path traversal wrote outside cache root")
}
}
func TestServeHTTP_ForwardsErrorStatus(t *testing.T) {
c, _ := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {
http.Error(w, "Forbidden", http.StatusForbidden)
})
rec := httptest.NewRecorder()
c.ServeHTTP(rec, httptest.NewRequest(http.MethodGet, "/secret.txt", nil))
if rec.Code != http.StatusForbidden {
t.Errorf("status = %d, want 403", rec.Code)
}
cached := filepath.Join(c.root, "secret.txt")
if _, err := os.Stat(cached); !os.IsNotExist(err) {
t.Error("403 response was cached")
}
}
func TestRevalidate_PurgesOn403(t *testing.T) {
root := t.TempDir()
if err := os.WriteFile(filepath.Join(root, "victim.txt"), []byte("cached"), 0o644); err != nil {
t.Fatalf("seed: %v", err)
}
upstream := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
http.Error(w, "Forbidden", http.StatusForbidden)
}))
defer upstream.Close()
c, err := New(config.Config{Root: root, Upstream: upstream.URL, Mode: "cache"})
if err != nil {
t.Fatalf("New: %v", err)
}
c.revalidate("/victim.txt", time.Now())
if _, err := os.Stat(filepath.Join(root, "victim.txt")); !os.IsNotExist(err) {
t.Error("revalidate did not purge after 403")
}
}
func TestRevalidate_PurgesOn404(t *testing.T) {
root := t.TempDir()
if err := os.WriteFile(filepath.Join(root, "gone.txt"), []byte("cached"), 0o644); err != nil {
t.Fatalf("seed: %v", err)
}
upstream := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
http.NotFound(w, r)
}))
defer upstream.Close()
c, err := New(config.Config{Root: root, Upstream: upstream.URL, Mode: "cache"})
if err != nil {
t.Fatalf("New: %v", err)
}
c.revalidate("/gone.txt", time.Now())
if _, err := os.Stat(filepath.Join(root, "gone.txt")); !os.IsNotExist(err) {
t.Error("revalidate did not purge after 404")
}
}
func TestRevalidate_NoPurgeOn200ButRefreshes(t *testing.T) {
root := t.TempDir()
old := []byte("old-content")
if err := os.WriteFile(filepath.Join(root, "fresh.txt"), old, 0o644); err != nil {
t.Fatalf("seed: %v", err)
}
// Set the file's mtime to an hour ago.
hourAgo := time.Now().Add(-time.Hour)
_ = os.Chtimes(filepath.Join(root, "fresh.txt"), hourAgo, hourAgo)
upstream := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
_, _ = w.Write([]byte("new-content"))
}))
defer upstream.Close()
c, err := New(config.Config{Root: root, Upstream: upstream.URL, Mode: "cache"})
if err != nil {
t.Fatalf("New: %v", err)
}
c.revalidate("/fresh.txt", hourAgo)
got, _ := os.ReadFile(filepath.Join(root, "fresh.txt"))
if string(got) != "new-content" {
t.Errorf("revalidate did not refresh: got %q", string(got))
}
}
func TestRevalidate_NoOpOn304(t *testing.T) {
root := t.TempDir()
original := []byte("original")
if err := os.WriteFile(filepath.Join(root, "still.txt"), original, 0o644); err != nil {
t.Fatalf("seed: %v", err)
}
upstream := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Always return 304; assume client sent If-Modified-Since.
if r.Header.Get("If-Modified-Since") == "" {
t.Errorf("revalidate did not send If-Modified-Since")
}
w.WriteHeader(http.StatusNotModified)
}))
defer upstream.Close()
c, err := New(config.Config{Root: root, Upstream: upstream.URL, Mode: "cache"})
if err != nil {
t.Fatalf("New: %v", err)
}
c.revalidate("/still.txt", time.Now())
got, _ := os.ReadFile(filepath.Join(root, "still.txt"))
if string(got) != "original" {
t.Errorf("304 caused content change: got %q", string(got))
}
}
func TestRangeRequest_Hit(t *testing.T) {
c, _ := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/plain")
_, _ = w.Write([]byte("0123456789"))
})
// Seed cache.
rec := httptest.NewRecorder()
c.ServeHTTP(rec, httptest.NewRequest(http.MethodGet, "/data.txt", nil))
if rec.Code != http.StatusOK {
t.Fatalf("seed: %d", rec.Code)
}
// Range request.
rec2 := httptest.NewRecorder()
r2 := httptest.NewRequest(http.MethodGet, "/data.txt", nil)
r2.Header.Set("Range", "bytes=2-5")
c.ServeHTTP(rec2, r2)
if rec2.Code != http.StatusPartialContent {
t.Fatalf("range = %d, want 206", rec2.Code)
}
if rec2.Body.String() != "2345" {
t.Errorf("range body = %q", rec2.Body.String())
}
if got := rec2.Header().Get("Content-Range"); !strings.HasPrefix(got, "bytes 2-5/") {
t.Errorf("Content-Range = %q", got)
}
}
func TestServeHTTP_ConcurrentRequestsForSameURL(t *testing.T) {
// Stress the marker-once and tmpfile path with parallel misses.
var hits int32
c, _ := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {
atomic.AddInt32(&hits, 1)
_, _ = io.WriteString(w, "concurrent")
})
var wg sync.WaitGroup
for i := 0; i < 8; i++ {
wg.Add(1)
go func() {
defer wg.Done()
rec := httptest.NewRecorder()
c.ServeHTTP(rec, httptest.NewRequest(http.MethodGet, "/c.txt", nil))
if rec.Code != http.StatusOK {
t.Errorf("status = %d", rec.Code)
}
if rec.Body.String() != "concurrent" {
t.Errorf("body = %q", rec.Body.String())
}
}()
}
wg.Wait()
// File should exist with the right content.
got, err := os.ReadFile(filepath.Join(c.root, "c.txt"))
if err != nil {
t.Fatalf("read: %v", err)
}
if string(got) != "concurrent" {
t.Errorf("cached body = %q", string(got))
}
}
func TestCachePathFor_Boundaries(t *testing.T) {
c, _ := newTestCache(t, "cache", func(w http.ResponseWriter, r *http.Request) {})
cases := []struct {
urlPath string
ok bool
}{
{"", false},
{"/", false},
{"/../etc/passwd", false},
{"/foo/../bar", false},
{"/foo/bar.txt", true},
{"/" + MarkerFile, false},
{"/Project/foo.txt", true},
}
for _, tc := range cases {
_, ok := c.cachePathFor(tc.urlPath)
if ok != tc.ok {
t.Errorf("cachePathFor(%q) ok=%v, want %v", tc.urlPath, ok, tc.ok)
}
}
}

View file

@ -28,6 +28,16 @@ type Config struct {
AccessLog string // --access-log / ZDDC_ACCESS_LOG — file path for tee'd JSON access log; empty = stderr only AccessLog string // --access-log / ZDDC_ACCESS_LOG — file path for tee'd JSON access log; empty = stderr only
Insecure bool // --insecure / ZDDC_INSECURE=1 — opt out of safety checks (currently: allow start without a root .zddc, leaving the tree publicly accessible) Insecure bool // --insecure / ZDDC_INSECURE=1 — opt out of safety checks (currently: allow start without a root .zddc, leaving the tree publicly accessible)
NoAuth bool // --no-auth / ZDDC_NO_AUTH=1 — skip ACL enforcement entirely. This instance is NOT the security boundary; on master = "open" (anyone reads everything), on a client = "trust upstream's filtering, don't re-evaluate ACLs locally." NoAuth bool // --no-auth / ZDDC_NO_AUTH=1 — skip ACL enforcement entirely. This instance is NOT the security boundary; on master = "open" (anyone reads everything), on a client = "trust upstream's filtering, don't re-evaluate ACLs locally."
// Client-mode flags. When Upstream is non-empty, this binary runs
// as a downstream proxy/cache/mirror against the named master.
// Root then becomes the cache directory rather than the served
// data root. Master-mode flags (apps, archive, opa, etc.) are
// ignored in client mode — see cmd/zddc-server/main.go.
Upstream string // --upstream / ZDDC_UPSTREAM — master URL (https://master.example.com); empty = run as master
Mode string // --mode / ZDDC_MODE — "proxy" (no disk persistence), "cache" (default; persist on access), "mirror" (cache + access-triggered subtree warmer; phase 3)
BearerFile string // --bearer-file / ZDDC_BEARER_FILE — path to a 0600 file containing the master-issued token to forward upstream
SkipTLSVerify bool // --skip-tls-verify / ZDDC_SKIP_TLS_VERIFY=1 — accept self-signed / untrusted upstream certs. Distinct from --no-auth; intended for dev/internal CA scenarios only.
OPAURL string // --opa-url / ZDDC_OPA_URL — policy decider endpoint: "internal" (default), "http(s)://..." (real OPA via HTTP), or "unix:///..." (OPA via Unix socket) OPAURL string // --opa-url / ZDDC_OPA_URL — policy decider endpoint: "internal" (default), "http(s)://..." (real OPA via HTTP), or "unix:///..." (OPA via Unix socket)
OPAFailOpen bool // --opa-fail-open / ZDDC_OPA_FAIL_OPEN=1 — when external OPA is unreachable, allow instead of deny (default: fail closed) OPAFailOpen bool // --opa-fail-open / ZDDC_OPA_FAIL_OPEN=1 — when external OPA is unreachable, allow instead of deny (default: fail closed)
OPACacheTTL time.Duration // --opa-cache-ttl / ZDDC_OPA_CACHE_TTL — external mode only: per-decision cache TTL. Default 1s. Set 0s to disable. OPACacheTTL time.Duration // --opa-cache-ttl / ZDDC_OPA_CACHE_TTL — external mode only: per-decision cache TTL. Default 1s. Set 0s to disable.
@ -89,6 +99,14 @@ func Load(args []string) (Config, error) {
"Allow startup with no root .zddc file (the tree is then publicly accessible). Default: refuse to start.") "Allow startup with no root .zddc file (the tree is then publicly accessible). Default: refuse to start.")
noAuthFlag := fs.Bool("no-auth", os.Getenv("ZDDC_NO_AUTH") == "1", noAuthFlag := fs.Bool("no-auth", os.Getenv("ZDDC_NO_AUTH") == "1",
"Skip ACL enforcement entirely. On master: anyone reads everything (dev / trusted-LAN / public-read deployments). On client: trust upstream's filtering. Distinct from --insecure (which gates startup-without-.zddc). Default: enforce ACLs.") "Skip ACL enforcement entirely. On master: anyone reads everything (dev / trusted-LAN / public-read deployments). On client: trust upstream's filtering. Distinct from --insecure (which gates startup-without-.zddc). Default: enforce ACLs.")
upstreamFlag := fs.String("upstream", os.Getenv("ZDDC_UPSTREAM"),
"Master URL (e.g. https://master.example.com). When set, this binary runs as a downstream proxy/cache/mirror against the master; --root becomes the cache directory. Empty (default) = run as master.")
modeFlag := fs.String("mode", getEnv("ZDDC_MODE", "cache"),
"Client mode: \"proxy\" (forward upstream live, no disk persistence), \"cache\" (default; persist responses on access), \"mirror\" (phase 3). Ignored when --upstream is empty.")
bearerFileFlag := fs.String("bearer-file", os.Getenv("ZDDC_BEARER_FILE"),
"Path to a 0600 file containing the master-issued token forwarded as Authorization: Bearer to upstream. See /.tokens on the master to issue one. Ignored when --upstream is empty.")
skipTLSVerifyFlag := fs.Bool("skip-tls-verify", os.Getenv("ZDDC_SKIP_TLS_VERIFY") == "1",
"Accept self-signed / untrusted TLS certs from the upstream. Distinct from --no-auth. Intended for dev or internal-CA scenarios only.")
opaURLFlag := fs.String("opa-url", getEnv("ZDDC_OPA_URL", "internal"), opaURLFlag := fs.String("opa-url", getEnv("ZDDC_OPA_URL", "internal"),
"Policy decider endpoint: \"internal\" (built-in Go evaluator, default), \"http(s)://host:port\", or \"unix:///path/to/socket\".") "Policy decider endpoint: \"internal\" (built-in Go evaluator, default), \"http(s)://host:port\", or \"unix:///path/to/socket\".")
opaFailOpenFlag := fs.Bool("opa-fail-open", os.Getenv("ZDDC_OPA_FAIL_OPEN") == "1", opaFailOpenFlag := fs.Bool("opa-fail-open", os.Getenv("ZDDC_OPA_FAIL_OPEN") == "1",
@ -157,6 +175,10 @@ func Load(args []string) (Config, error) {
AccessLog: *accessLogFlag, AccessLog: *accessLogFlag,
Insecure: *insecureFlag, Insecure: *insecureFlag,
NoAuth: *noAuthFlag, NoAuth: *noAuthFlag,
Upstream: *upstreamFlag,
Mode: *modeFlag,
BearerFile: *bearerFileFlag,
SkipTLSVerify: *skipTLSVerifyFlag,
OPAURL: *opaURLFlag, OPAURL: *opaURLFlag,
OPAFailOpen: *opaFailOpenFlag, OPAFailOpen: *opaFailOpenFlag,
OPACacheTTL: *opaCacheTTLFlag, OPACacheTTL: *opaCacheTTLFlag,
@ -189,7 +211,14 @@ func Load(args []string) (Config, error) {
// accessible to anonymous callers. The vast majority of operators do not // accessible to anonymous callers. The vast majority of operators do not
// want that — and the few who do (a deliberately public archive) can pass // want that — and the few who do (a deliberately public archive) can pass
// --insecure to acknowledge it. See zddc/README.md § Access control. // --insecure to acknowledge it. See zddc/README.md § Access control.
if !cfg.Insecure { //
// Skipped in client mode (cfg.Upstream != ""): the cache directory
// starts empty by design, so a missing .zddc is not a security
// concern — the cache layer doesn't evaluate ACLs locally
// (upstream filtering is the boundary; --no-auth on a client
// formalizes that). The directory will fill in as files are
// fetched, and any cached .zddc files come straight from upstream.
if !cfg.Insecure && cfg.Upstream == "" {
if _, err := os.Stat(filepath.Join(cfg.Root, ".zddc")); os.IsNotExist(err) { if _, err := os.Stat(filepath.Join(cfg.Root, ".zddc")); os.IsNotExist(err) {
return Config{}, fmt.Errorf( return Config{}, fmt.Errorf(
"no %s/.zddc file found; the served tree would be publicly accessible to anonymous callers. "+ "no %s/.zddc file found; the served tree would be publicly accessible to anonymous callers. "+
@ -245,7 +274,12 @@ func Load(args []string) (Config, error) {
// behind an authenticating reverse proxy. Refuse to start when binding // behind an authenticating reverse proxy. Refuse to start when binding
// plain HTTP to a non-loopback interface unless the operator has // plain HTTP to a non-loopback interface unless the operator has
// explicitly acknowledged the deployment shape. // explicitly acknowledged the deployment shape.
if cfg.TLSMode == "none" && !isLoopbackAddr(cfg.Addr) && !*insecureDirectFlag { //
// In client mode (Upstream set), the local instance never reads the
// email header to make decisions — auth is forwarded as a Bearer
// token to upstream and the local instance trusts upstream's
// filtering. So this check doesn't apply.
if cfg.Upstream == "" && cfg.TLSMode == "none" && !isLoopbackAddr(cfg.Addr) && !*insecureDirectFlag {
return Config{}, fmt.Errorf( return Config{}, fmt.Errorf(
"--tls-cert=none binds plain HTTP to %q which trusts %s headers from any client; "+ "--tls-cert=none binds plain HTTP to %q which trusts %s headers from any client; "+
"either use TLS (omit --tls-cert or supply a cert), bind to loopback (127.0.0.1: or [::1]:), "+ "either use TLS (omit --tls-cert or supply a cert), bind to loopback (127.0.0.1: or [::1]:), "+
@ -253,6 +287,25 @@ func Load(args []string) (Config, error) {
cfg.Addr, cfg.EmailHeader) cfg.Addr, cfg.EmailHeader)
} }
// Client-mode validation. Only enforced when --upstream is set;
// the same flags are silently ignored in master mode.
if cfg.Upstream != "" {
switch cfg.Mode {
case "proxy", "cache", "mirror":
// ok
case "":
cfg.Mode = "cache"
default:
return Config{}, fmt.Errorf("--mode must be \"proxy\", \"cache\", or \"mirror\"; got %q", cfg.Mode)
}
if !strings.HasPrefix(cfg.Upstream, "http://") && !strings.HasPrefix(cfg.Upstream, "https://") {
return Config{}, fmt.Errorf("--upstream %q must start with http:// or https://", cfg.Upstream)
}
if strings.HasSuffix(cfg.Upstream, "/") {
cfg.Upstream = strings.TrimRight(cfg.Upstream, "/")
}
}
return cfg, nil return cfg, nil
} }
@ -279,6 +332,10 @@ func Usage(w io.Writer) {
fs.Bool("insecure-direct", false, "Allow plain HTTP on non-loopback addresses.") fs.Bool("insecure-direct", false, "Allow plain HTTP on non-loopback addresses.")
fs.Bool("insecure", false, "Allow startup with no root .zddc file (publicly accessible). Default: refuse.") fs.Bool("insecure", false, "Allow startup with no root .zddc file (publicly accessible). Default: refuse.")
fs.Bool("no-auth", false, "Skip ACL enforcement entirely. On master: anyone reads everything. On client: trust upstream's filtering. Distinct from --insecure.") fs.Bool("no-auth", false, "Skip ACL enforcement entirely. On master: anyone reads everything. On client: trust upstream's filtering. Distinct from --insecure.")
fs.String("upstream", "", "Master URL — when set, run as a downstream proxy/cache/mirror; --root becomes the cache directory. Empty (default) = master.")
fs.String("mode", "cache", "Client mode: proxy / cache / mirror. Ignored when --upstream is empty.")
fs.String("bearer-file", "", "Path to a 0600 file holding the master-issued bearer token forwarded to upstream. Ignored when --upstream is empty.")
fs.Bool("skip-tls-verify", false, "Accept self-signed / untrusted upstream TLS certs. Distinct from --no-auth. Dev / internal-CA scenarios only.")
fs.String("opa-url", "internal", "Policy decider: \"internal\", \"http(s)://...\", or \"unix:///...\".") fs.String("opa-url", "internal", "Policy decider: \"internal\", \"http(s)://...\", or \"unix:///...\".")
fs.Bool("opa-fail-open", false, "External OPA: allow on transport error (default: deny / fail closed).") fs.Bool("opa-fail-open", false, "External OPA: allow on transport error (default: deny / fail closed).")
fs.Duration("opa-cache-ttl", time.Second, "External OPA: per-decision cache TTL (default 1s; 0 disables).") fs.Duration("opa-cache-ttl", time.Second, "External OPA: per-decision cache TTL (default 1s; 0 disables).")