Commit graph

6 commits

Author SHA1 Message Date
ac7553f940 fix(client): plug confused-deputy bind in client mode
A focused security review of phases 1-4 surfaced one MEDIUM finding
(confidence 9/10): in client mode (--upstream set) the cache layer
forwards the configured bearer to upstream on every incoming request
without authenticating the local caller, AND --addr defaulted to
:8443 (all interfaces). Together those mean a CLI user running
`zddc-server --upstream https://master --bearer-file ~/token` on a
laptop on hotel/cafe Wi-Fi exposes an open-proxy confused-deputy:
any attacker on the same L2 connects to https://<laptop-ip>:8443,
accepts the self-signed cert, issues GETs (or PUTs/DELETEs that
queue in the outbox), and the cache laundries each request through
upstream with the engineer's bearer. The full cached subtree leaks.

Two layers of defense in config.Load:

1. Loopback default in client mode. When cfg.Upstream is set and
   neither --addr nor ZDDC_ADDR was passed explicitly, --addr
   downgrades to "127.0.0.1:8443" (vs ":8443" in master mode). CLI
   users on a laptop get safe-by-default. Operators who want a
   non-loopback bind opt in explicitly.

2. Refuse non-loopback bind + bearer-file without acknowledgement.
   When cfg.Upstream is set, BearerFile is non-empty, the chosen
   addr is non-loopback, AND --insecure-direct is not set, the load
   fails with an error that names the bind, the threat (open-proxy
   confused-deputy laundering bearer credentials), and the
   acknowledgement flag. The helm zddc-server-cache/ chart already
   sets ZDDC_INSECURE_DIRECT=1 and relies on Kubernetes-namespaced
   pod networking for the gating, so the chart path is unaffected.
   The guard is bearer-file-conditional because proxy mode without a
   bearer doesn't have a credential to launder, and refusing it
   would needlessly block proxy-without-auth deployments.

Tests in internal/config/config_test.go lock down all four cases:
- --upstream with no explicit --addr → 127.0.0.1:8443
- --upstream + non-loopback --addr + --bearer-file (no IDirect) → refuse
- --upstream + non-loopback --addr + --bearer-file + --insecure-direct → ok
- --upstream + non-loopback --addr + NO bearer → ok (no credential to leak)

Doc updates: zddc/README.md client-mode "Flags" section gets a
WARNING block describing the loopback default + insecure-direct
escape hatch. AGENTS.md ZDDC_UPSTREAM row mentions the addr
downgrade. ARCHITECTURE.md gains a "Confused-deputy guard at
startup" subsection under "Master + proxy/cache/mirror" with the
two-layer defense rationale. helm/zddc-server-cache/values.yaml.example
adds an inline note next to addr: ":8080" explaining why the chart
sets ZDDC_INSECURE_DIRECT=1 and what the consequence is of removing
either side of the gating.

Master mode is unaffected — the client-mode validation block is
gated by `if cfg.Upstream != ""`. All existing tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 10:03:51 -05:00
55852a9efb helm: add zddc-server-cache example chart + ZDDC_NO_AUTH on prod/dev
New chart helm/zddc-server-cache/ deploys zddc-server in client mode
against an upstream master. Mirrors the prod chart's source-build-via-
init-container pattern but with:

- ZDDC_UPSTREAM, ZDDC_MODE, ZDDC_BEARER_FILE, ZDDC_NO_AUTH,
  ZDDC_SKIP_TLS_VERIFY, ZDDC_MIRROR_SUBTREE, ZDDC_MIRROR_MIN_INTERVAL
  wired from values.yaml. Mirror-only env vars conditionally rendered
  (only when mode=mirror) to keep the rendered manifest minimal.
- Bearer token mounted from a separately-created Kubernetes Secret
  (defaultMode 0400) at /etc/zddc/bearer/token. values.yaml.example
  documents the secret-creation flow but contains no token. Secret
  reference can be set to "" to disable bearer auth (only valid for
  upstreams running --no-auth).
- Recreate strategy + replicaCount: 1 (multiple replicas would race
  the cache directory and double the upstream walker traffic).
- TCP-socket probes instead of HTTP — HTTP probes against / would
  fail when both upstream is unreachable AND the cache is empty
  (the cache layer returns 503 + offline header in that state),
  causing crashloops. TCP verifies process liveness without depending
  on upstream reachability or cache contents.
- Mounts a separate cache PVC (operator-provided, like the master's
  data PVC). Sized to the working set you expect to mirror; can be
  much smaller than the master's data volume.

Existing prod and dev charts gain optional ZDDC_NO_AUTH wired from
zddc.env.noAuth (default false → no change to existing rendered
manifests). Useful for trusted-LAN or genuinely-public master
deployments.

Updated docs: helm/README.md gains the cache row in the chart table,
the cache-install quickstart with the secret-creation flow, and the
cache-specific structural notes (Recreate / TCP probes / single-
instance). CLAUDE.md and ARCHITECTURE.md updated to reflect three
charts instead of two.

Verified with helm template rendering: ZDDC_NO_AUTH only renders
when noAuth: true; ZDDC_MIRROR_SUBTREE / ZDDC_MIRROR_MIN_INTERVAL
only render when mode: mirror; bearer volume + ZDDC_BEARER_FILE
only render when bearer.secretName is non-empty.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 08:33:01 -05:00
13ae1498e4 docs(helm): describe dev chart's OverlayFS isolation in README + Chart.yaml
The dev chart's overlay-isolation layer (added in 9765fa2) was not
called out in helm/README.md or zddc-server-dev/Chart.yaml. Readers
comparing the two charts saw "same shape but tracks main" without
learning that the dev chart wraps the data PVC in OverlayFS so its
writes never mutate the underlying store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:33:04 -05:00
9765fa2f5e feat(apps): code-signed URL fetches; dev chart overlays prod data RO
Two interlocking pieces shipped together:

1. Strict Ed25519 signature verification on URL-fetched apps artifacts.
   Every URL the apps cascade resolves must publish a corresponding
   <url>.sig (raw 64-byte Ed25519 signature). The fetcher rejects on
   any failure (sig 404, transport error, wrong key, tampered body)
   and the resolver falls back to the embedded copy.

   The trusted public key is OPERATOR-CONFIGURED via --apps-pubkey /
   ZDDC_APPS_PUBKEY (PEM file path). No baked-in default — same posture
   as TLS certificates. Operators using zddc.varasys.io's canonical
   channels download pubkey.pem from there and configure the local
   path. Operators with their own signing infrastructure pass their
   own public key.

   Build pipeline (./build) gains sign_release_artifacts: walks
   dist/release-output/ after promote and produces an Ed25519 .sig
   alongside every real file. ZDDC_SIGNING_KEY=~/.config/zddc-signing/
   key.pem (mode 0600). Symlinks skip — the .sig at the symlink
   target is what counts.

   Test coverage: parse-PEM round-trip, malformed/wrong-type PEM
   rejection, valid-signature accept, tampered-body reject, wrong-key
   reject, malformed-signature reject, end-to-end fetch+sign+verify,
   fetch-rejects-tampered, fetch-rejects-missing-sig, fetch-rejects-
   wrong-key. Existing fetch tests updated to use signed-fixture
   helpers.

2. Dev Helm chart mounts production data READ-ONLY and layers an
   OverlayFS writable scratch on top. Prod data is the lowerdir;
   dev's writes (form submissions, archive index state, .zddc edits)
   land in upperdir; main container sees the merged read-write view
   at $ZDDC_ROOT. Setup runs in a privileged init container; main
   container runs unprivileged. Solves the dev-replica-on-shared-
   dataset problem at the filesystem layer with no zddc-server code
   change.

Docs: env-var tables in zddc/README.md and AGENTS.md gain a
ZDDC_APPS_PUBKEY row. The Federal-readiness gap analysis "Code-signed
apps: URL fetches" subsection is rewritten as "what's currently in
place" instead of "what would need to be added," with a forward
pointer to per-entry signed_by: (multi-key) and Sigstore as the
federally-acceptable evolution.

The website "Verify your downloads" section + the embedded pubkey
gone — but the website needs separate updates landing in zddc-website
to publish pubkey.pem and add the verify section. Pending in that
repo's commit.

Production binary unchanged at 13.1 MB. All 11 Go test packages green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 21:59:07 -05:00
6b973906c3 feat(server): refuse to start without root .zddc; default CORS to empty
Two safe-by-default flips, both opt-out via explicit acknowledgement.

1. --insecure / ZDDC_INSECURE=1: zddc-server now refuses to start when
   no <ZDDC_ROOT>/.zddc exists. With no .zddc anywhere in the chain,
   AllowedWithChain falls through to "HasAnyFile=false → allow" and
   the tree is publicly accessible to anonymous callers — almost never
   what an operator wants on a fresh deployment, and previously a
   silent footgun. The flag is the escape hatch for deliberately-
   public archives (no .zddc anywhere by design).

2. ZDDC_CORS_ORIGIN now defaults to empty (CORS disabled) instead of
   the canonical "https://zddc.varasys.io". The embedded-tools install
   path serves tools and data same-origin, so the default never needed
   to permit cross-origin XHRs from a third-party host. Every deployment
   was implicitly trusting zddc.varasys.io to make authenticated XHRs
   on behalf of every logged-in user; if that origin were ever
   compromised, the blast radius extended to every customer server.
   Operators who deliberately use the CDN-bootstrap pattern or self-
   hosted tools at a different host now set the value explicitly.

Helm chart values updated accordingly: prod default is empty; dev
keeps localhost:8000 for tool-iteration workflows. Existing deployments
that depended on the old defaults will need to either set the value
explicitly or pass --insecure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 17:40:34 -05:00
607121a9ea feat: example helm charts for zddc-server (production + dev)
Two charts under helm/, both compile zddc-server from source via an
init container — no container image registry, no pre-built binary.
The init container clones the repo at a configured git ref, runs
`go build`, and writes the binary into a shared emptyDir; the main
container is alpine + the freshly built static binary.

helm/zddc-server-prod/  Production-shaped:
                        - gitRef pinned to a stable tag in
                          values.yaml.example (zddc-server-v0.0.7).
                        - imagePullPolicy IfNotPresent.
                        - Slower probe cadence (30s liveness, 10s
                          readiness).
                        - ZDDC_LOG_LEVEL=info.
                        - replicaCount: 1 (operators raise as needed
                          when backed by a shared filesystem).

helm/zddc-server-dev/   Dev/soak-shaped:
                        - gitRef defaults to "main" (rebuilt every pod
                          restart). build-time annotation forces
                          recreate on every helm upgrade.
                        - imagePullPolicy Always on the build image
                          so the latest golang:1.24-alpine is pulled.
                        - Faster probe cadence (10s liveness, 5s
                          readiness) — fail-fast in dev.
                        - ZDDC_LOG_LEVEL=debug. NOTE: debug logs every
                          request's full header map (includes auth
                          tokens / cookies) — this chart is for
                          private dev namespaces only.
                        - Strategy: Recreate (single replica racing
                          on different SHAs would be a mess).

Both charts:

- Wire the ZDDC_* env-var contract (ZDDC_ROOT, ZDDC_ADDR,
  ZDDC_TLS_CERT=none, ZDDC_INSECURE_DIRECT=1, ZDDC_EMAIL_HEADER,
  ZDDC_CORS_ORIGIN, ZDDC_LOG_LEVEL, ZDDC_INDEX_PATH).
- Mount a caller-supplied PVC at ZDDC_ROOT (chart does not create the
  PVC; operators provision storage themselves).
- Optional Ingress (ingress.enabled: true). TLS is expected to be
  terminated upstream of the pod; the pod listens on plain HTTP.
- No secrets in values.yaml.example. ACL email lists go in .zddc files
  inside the data volume; image-pull and TLS secrets are referenced by
  name only.

helm/README.md documents the design rationale (why build from source
instead of using a registry image), a quick-start example, and the
explicit list of what the charts do and don't do.

Note: `helm lint` cannot be run in this dev environment (helm isn't
installed). YAML syntax of Chart.yaml and values.yaml.example
verified via `python3 -c "yaml.safe_load(...)"`. Operators should
run `helm lint` and `helm template` before installing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 09:48:02 -05:00