ZDDC/zddc/internal/policy/policy.go
ZDDC 97ffaac13b feat(server): self-issued bearer tokens + --no-auth flag
zddc-server now issues its own bearer tokens for non-browser callers
(CLI tools, scripts, downstream proxy/cache/mirror instances). No
external IDP, no JWKS rotation. Self-service flow: sign in via the
browser, visit /.tokens, click "Create token," paste the resulting
plaintext into a 0600 file, and pass --bearer-file <path> to whatever
calls back into the server.

Storage is <ZDDC_ROOT>/.zddc.d/tokens/<sha256-hex>, YAML per token
with email/created/expires/description. Filename is the *hash* of the
plaintext, never the plaintext itself — a leak of the tokens
directory exposes hashes, not credentials. Mode 0600 / 0700, atomic
writes via temp+rename. Already shielded from public serving by the
existing dot-prefix guards in dispatch and fs.ListDirectory.

ACLMiddleware now recognises Authorization: Bearer <token>. On valid
token, sets the request email from the token file and falls through
to the existing ACL chain. On any failure (unknown / expired / store
unavailable / Bearer with no validator), returns 401 — no silent
fallback to anonymous, so a misconfigured client fails loudly.

JSON API at /.api/tokens (GET list, POST create, DELETE /<id> revoke)
backs a small inline HTML self-service page at /.tokens. Users can
only see and revoke their own tokens; cross-user revoke returns 404
to avoid leaking ownership.

--no-auth (ZDDC_NO_AUTH=1) skips ACL enforcement entirely on this
instance. On master: anyone reads everything (dev / trusted-LAN /
public-read deployments). On a downstream proxy/cache/mirror: trust
upstream's filtering, don't re-evaluate ACLs locally. Implemented as
a swap to policy.AllowAllDecider; all existing handlers keep calling
AllowFromChain unchanged. Distinct from --insecure, which only
relaxes the no-root-.zddc startup check. WARN-level startup log when
--no-auth is active so accidental enablement is visible.

33 new tests covering token storage, validation/expiry/revocation,
the JSON API end-to-end, the HTML page, and the middleware-Bearer
integration including the case-insensitive prefix and expired-token
paths. Full suite + go vet clean.

Doc updates: zddc/README.md "Authentication" rewritten to cover both
auth paths and the token UI/API; AGENTS.md gains ZDDC_NO_AUTH and a
"Bearer tokens" subsection flagging the dot-prefix-shielding pre-
condition; ARCHITECTURE.md adds "Bearer token issuance" and
"--no-auth" subsections under "Server security model" with the
hash-as-filename rationale and dispatch-shielding regression-
sensitivity called out; CLAUDE.md adds a one-line summary of the new
auth topology so future agents pick it up by default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 07:40:28 -05:00

428 lines
16 KiB
Go

// Package policy is the access-decision boundary for zddc-server.
//
// All ACL checks in handlers go through Decider.Allow rather than
// calling zddc.AllowedWithChain directly. This lets a deployment
// route policy decisions to an external OPA-compatible server
// (for federal customers running their own audited Rego policies)
// without changing handler code.
//
// Two implementations:
//
// - InternalDecider — wraps zddc.AllowedWithChain. The default;
// no new dependencies, identical semantics to the legacy code
// path. This is what the docs in zddc/README.md describe.
//
// - HTTPDecider — POSTs to OPA's canonical /v1/data/<package>/allow
// endpoint over HTTP or a Unix-domain socket. Federal customers
// deploy real OPA alongside zddc-server, write their own Rego,
// and point ZDDC_OPA_URL at it.
//
// Configuration knob:
//
// ZDDC_OPA_URL= # internal (default)
// ZDDC_OPA_URL=internal # internal (explicit)
// ZDDC_OPA_URL=http://127.0.0.1:8181 # external HTTP
// ZDDC_OPA_URL=https://opa.example:8181 # external HTTPS
// ZDDC_OPA_URL=unix:///run/opa/opa.sock # external Unix socket
//
// Failure mode (external only): unreachable / non-2xx / malformed
// response → fail closed (deny), with a WARN log. Operators who
// prefer availability over correctness can set ZDDC_OPA_FAIL_OPEN=1
// to flip to fail-open with a WARN log instead.
package policy
import (
"bytes"
"context"
"crypto/sha256"
"encoding/hex"
"encoding/json"
"errors"
"fmt"
"io"
"log/slog"
"net"
"net/http"
"net/url"
"strings"
"sync"
"time"
"codeberg.org/VARASYS/ZDDC/zddc/internal/zddc"
)
// AllowInput is the canonical input shape for Decider.Allow. It
// matches OPA's input convention: a JSON object passed as the
// "input" field of a /v1/data/<package>/allow query.
//
// External Rego policies can:
// - read input.user.email (string)
// - read input.path (string)
// - read input.action ("read" | "write"); empty/absent ≡ "read"
// - walk input.policy_chain.levels[].acl.{allow,deny} for
// custom cascade semantics, or read the pre-resolved
// input.policy_chain.has_any_file when implementing the
// same default-deny rule we use internally.
//
// Action distinguishes read (GET/HEAD on listings, files, app HTML)
// from write (PUT, DELETE, POST/move on the file API). The internal
// decider treats both identically — any allow grants full CRUD,
// matching the model in place before the file API existed (anyone
// with read access also had OS-level write via the mounted share).
// External Rego policies can split the two by inspecting input.action.
type AllowInput struct {
User struct {
Email string `json:"email"`
} `json:"user"`
Path string `json:"path"`
Action string `json:"action,omitempty"`
PolicyChain *SerializableChain `json:"policy_chain,omitempty"`
}
// Action constants used in AllowInput.Action. Empty string is also
// accepted for back-compat with callers that don't specify a verb.
const (
ActionRead = "read" // listing + reading file bytes
ActionWrite = "write" // overwriting an existing file (legacy alias for the historical write-vs-read split)
ActionCreate = "create" // creating a new file or directory
ActionDelete = "delete" // deleting a file
ActionAdmin = "admin" // modifying ACL / .zddc / role definitions
)
// actionVerb maps an Action string to the zddc.VerbSet bit it requires.
// Returns the read verb for unrecognized values so the internal
// decider stays restrictive on unknown action labels.
func actionVerb(action string) zddc.VerbSet {
switch action {
case ActionWrite:
return zddc.VerbW
case ActionCreate:
return zddc.VerbC
case ActionDelete:
return zddc.VerbD
case ActionAdmin:
return zddc.VerbA
default:
return zddc.VerbR
}
}
// SerializableChain is a JSON-friendly view of zddc.PolicyChain.
// We don't tag zddc.PolicyChain itself because it's tightly coupled
// to the parser; the duplication is one struct.
type SerializableChain struct {
Levels []zddc.ZddcFile `json:"levels"`
HasAnyFile bool `json:"has_any_file"`
}
func chainToSerializable(c zddc.PolicyChain) *SerializableChain {
return &SerializableChain{Levels: c.Levels, HasAnyFile: c.HasAnyFile}
}
// Decider is the access-decision interface every handler uses.
type Decider interface {
Allow(ctx context.Context, input AllowInput) (bool, error)
}
// Config selects and parameterizes the decider.
type Config struct {
URL string // raw value: "", "internal", "http(s)://...", "unix:///path"
FailOpen bool // external mode only: on transport error, allow instead of deny
CacheTTL time.Duration // external mode only: per-decision cache TTL. Zero = default 1s. Negative = no cache.
// CascadeMode controls how the InternalDecider walks the ACL chain:
// "delegated" (default — leaf grants override ancestor denies) or
// "strict" (ancestor explicit-deny is absolute; NIST AC-6).
// External deciders ignore this — Rego policies access the chain
// directly and implement either semantic themselves.
CascadeMode string
}
// New constructs a Decider per cfg.URL semantics.
// - "" or "internal" → InternalDecider (no cache — the in-process
// evaluator is already cheaper than a cache lookup would be)
// - "http(s)://..." → HTTPDecider wrapped in a small per-decision
// cache (default 1s TTL — short enough that staleness is bounded
// to the same window as fsnotify-debounced index refresh, long
// enough to amortize bursty listings like .archive enumeration
// into one OPA round-trip per (email, decision-input))
// - "unix:///..." → same as http(s), over a Unix socket
//
// Returns an error if URL is unrecognized.
func New(cfg Config) (Decider, error) {
mode, _ := zddc.ParseCascadeMode(cfg.CascadeMode)
if cfg.URL == "" || strings.EqualFold(cfg.URL, "internal") {
return &InternalDecider{Mode: mode}, nil
}
var inner Decider
var err error
switch {
case strings.HasPrefix(cfg.URL, "http://"), strings.HasPrefix(cfg.URL, "https://"):
inner, err = newHTTPDecider(cfg.URL, cfg.FailOpen, nil)
case strings.HasPrefix(cfg.URL, "unix://"):
path := strings.TrimPrefix(cfg.URL, "unix://")
dialer := &net.Dialer{Timeout: 2 * time.Second}
transport := &http.Transport{
DialContext: func(ctx context.Context, _, _ string) (net.Conn, error) {
return dialer.DialContext(ctx, "unix", path)
},
}
inner, err = newHTTPDecider("http://opa-unix-socket", cfg.FailOpen, transport)
default:
return nil, fmt.Errorf("unrecognized ZDDC_OPA_URL %q (want \"internal\", http(s)://..., or unix:///...)", cfg.URL)
}
if err != nil {
return nil, err
}
ttl := cfg.CacheTTL
if ttl == 0 {
ttl = time.Second
}
if ttl < 0 {
// Negative TTL = caching disabled (test seam).
return inner, nil
}
return &cachingDecider{inner: inner, ttl: ttl}, nil
}
// AllowAllDecider unconditionally permits every request. Used when the
// operator runs zddc-server with --no-auth — that flag declares the
// instance is NOT the ACL boundary (master in a trusted-LAN deployment,
// or client mode where the upstream master enforced ACLs and the local
// instance trusts those filtering decisions). Swap into the decider
// slot at startup; all existing handlers continue to call Allow* and
// see allowed=true.
type AllowAllDecider struct{}
// Allow always returns true with nil error.
func (AllowAllDecider) Allow(_ context.Context, _ AllowInput) (bool, error) {
return true, nil
}
// InternalDecider routes Allow through zddc.AllowedAction with the
// configured cascade mode and applies the Issued/Received WORM mask
// post-decision. No network, no Rego, no new dependencies.
//
// The decider does NOT consult the admins:/IsAdmin escape hatch —
// callers in the handler package wire IsAdmin / IsSubtreeAdmin around
// the decision. Admins bypass the WORM mask there as well.
type InternalDecider struct {
Mode zddc.CascadeMode
}
func (d *InternalDecider) Allow(_ context.Context, input AllowInput) (bool, error) {
chain := zddc.PolicyChain{}
if input.PolicyChain != nil {
chain.Levels = input.PolicyChain.Levels
chain.HasAnyFile = input.PolicyChain.HasAnyFile
}
verb := actionVerb(input.Action)
email := input.User.Email
// WORM split: in Issued/Received, ancestor grants are read-only;
// only an explicit .zddc placed at-or-below the WORM folder can
// restore `c` (write-once) for principals it names. Admins are
// excluded from this code path by callers (handler package does
// the IsAdmin / IsSubtreeAdmin bypass before invoking Allow).
//
// EffectiveVerbsRange (rather than slicing chain.Levels) keeps the
// FULL chain visible to role-membership lookups so an ancestor's
// role definition still applies inside the sub-range walk.
if zddc.IsWormPath(input.Path) {
wormIdx := zddc.WormFolderLevelIndex(input.Path, len(chain.Levels))
if wormIdx >= 0 {
grantAbove := zddc.EffectiveVerbsRange(chain, 0, wormIdx, email, d.Mode) & zddc.VerbR
grantBelow := zddc.EffectiveVerbsRange(chain, wormIdx, len(chain.Levels), email, d.Mode) & zddc.VerbsRC
return (grantAbove | grantBelow).Has(verb), nil
}
}
return zddc.AllowedAction(chain, email, verb, d.Mode), nil
}
// HTTPDecider POSTs to /v1/data/zddc/access/allow on the configured
// endpoint. Spec:
// - request body {"input": <AllowInput>}
// - response body {"result": true|false}
// - 5-second per-request timeout
// - non-2xx, transport error, missing/malformed result → policy
// decision is "deny" unless FailOpen=true
//
// The path "/v1/data/zddc/access/allow" is the OPA convention; the
// "zddc.access" Rego package on an external server would expose
// `allow` for this endpoint.
type HTTPDecider struct {
endpoint string
client *http.Client
failOpen bool
}
func newHTTPDecider(endpoint string, failOpen bool, transport http.RoundTripper) (*HTTPDecider, error) {
if _, err := url.Parse(endpoint); err != nil {
return nil, fmt.Errorf("invalid OPA URL %q: %w", endpoint, err)
}
c := &http.Client{Timeout: 5 * time.Second}
if transport != nil {
c.Transport = transport
}
return &HTTPDecider{
endpoint: strings.TrimRight(endpoint, "/") + "/v1/data/zddc/access/allow",
client: c,
failOpen: failOpen,
}, nil
}
type opaResponse struct {
Result *bool `json:"result"`
}
func (d *HTTPDecider) Allow(ctx context.Context, input AllowInput) (bool, error) {
body, err := json.Marshal(struct {
Input AllowInput `json:"input"`
}{Input: input})
if err != nil {
return d.failResult(fmt.Errorf("marshal input: %w", err))
}
req, err := http.NewRequestWithContext(ctx, http.MethodPost, d.endpoint, bytes.NewReader(body))
if err != nil {
return d.failResult(fmt.Errorf("build request: %w", err))
}
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Accept", "application/json")
resp, err := d.client.Do(req)
if err != nil {
return d.failResult(fmt.Errorf("opa request: %w", err))
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
// Read up to 512 bytes of the error body for the log without
// blowing up on a verbose OPA error page.
snippet, _ := io.ReadAll(io.LimitReader(resp.Body, 512))
return d.failResult(fmt.Errorf("opa returned %d: %s", resp.StatusCode, strings.TrimSpace(string(snippet))))
}
var parsed opaResponse
if err := json.NewDecoder(resp.Body).Decode(&parsed); err != nil {
return d.failResult(fmt.Errorf("decode opa response: %w", err))
}
if parsed.Result == nil {
return d.failResult(errors.New("opa response missing 'result' field"))
}
return *parsed.Result, nil
}
// failResult logs the failure and returns the configured fail-mode
// decision. Logged at WARN so a healthy run is silent but a sick OPA
// is loud.
func (d *HTTPDecider) failResult(err error) (bool, error) {
if d.failOpen {
slog.Warn("policy decision failed; failing open (allow)", "endpoint", d.endpoint, "err", err)
return true, nil
}
slog.Warn("policy decision failed; failing closed (deny)", "endpoint", d.endpoint, "err", err)
return false, nil
}
// AllowFromChain is a convenience for callers that already have a
// PolicyChain in hand. Equivalent to constructing AllowInput manually
// from (chain, email, path) and calling d.Allow. Implies "read".
//
// New callers should use AllowActionFromChain with an explicit verb so
// the audit/policy stream records intent and the internal decider can
// apply the right verb-specific check.
func AllowFromChain(ctx context.Context, d Decider, chain zddc.PolicyChain, email, path string) (bool, error) {
return AllowActionFromChain(ctx, d, chain, email, path, ActionRead)
}
// AllowWriteFromChain is the legacy write-action helper. Newer callers
// should pick the specific verb (ActionCreate / ActionWrite /
// ActionDelete / ActionAdmin) via AllowActionFromChain instead.
func AllowWriteFromChain(ctx context.Context, d Decider, chain zddc.PolicyChain, email, path string) (bool, error) {
return AllowActionFromChain(ctx, d, chain, email, path, ActionWrite)
}
// AllowActionFromChain is the canonical access-decision helper.
// External Rego policies can branch on input.action to differentiate
// among the five verbs (read / write / create / delete / admin). The
// internal decider maps each action to its zddc.VerbSet bit and walks
// the cascade in the configured mode (delegated / strict).
func AllowActionFromChain(ctx context.Context, d Decider, chain zddc.PolicyChain, email, path, action string) (bool, error) {
in := AllowInput{Path: path, Action: action, PolicyChain: chainToSerializable(chain)}
in.User.Email = email
return d.Allow(ctx, in)
}
// cachingDecider wraps another Decider with a small per-decision cache.
// Designed for the external-OPA hot path: a single .archive listing or
// directory enumeration can hit the same (email, dir-policy) tuple
// dozens of times in milliseconds, and a remote OPA round-trip per
// query would dominate latency. The 1s default TTL bounds staleness to
// the same window as the fsnotify watcher's debounce, so a `.zddc` edit
// is reflected in the next listing rather than carried over indefinitely.
//
// Key shape: SHA-256 of the canonical JSON-serialized AllowInput. This
// makes the cache safe across all input variations (different paths,
// different chains, different users) without us having to enumerate
// the dimensions.
type cachingDecider struct {
inner Decider
ttl time.Duration
mu sync.Mutex
entries map[string]cacheEntry
}
type cacheEntry struct {
expires time.Time
allow bool
}
func (d *cachingDecider) Allow(ctx context.Context, input AllowInput) (bool, error) {
key, err := cacheKey(input)
if err != nil {
// Couldn't key — fall through to inner without caching. Should
// never happen in practice; AllowInput marshals as plain JSON.
return d.inner.Allow(ctx, input)
}
now := time.Now()
d.mu.Lock()
if d.entries == nil {
d.entries = make(map[string]cacheEntry)
}
if e, ok := d.entries[key]; ok && now.Before(e.expires) {
d.mu.Unlock()
return e.allow, nil
}
d.mu.Unlock()
allow, err := d.inner.Allow(ctx, input)
if err != nil {
return allow, err
}
d.mu.Lock()
// Best-effort eviction of expired entries — keeps the map from
// growing unbounded under high cardinality. O(n) but capped to
// occasional sweeps; fine for this scale.
if len(d.entries) > 4096 {
for k, e := range d.entries {
if now.After(e.expires) {
delete(d.entries, k)
}
}
}
d.entries[key] = cacheEntry{expires: now.Add(d.ttl), allow: allow}
d.mu.Unlock()
return allow, nil
}
func cacheKey(input AllowInput) (string, error) {
b, err := json.Marshal(input)
if err != nil {
return "", err
}
h := sha256.Sum256(b)
return hex.EncodeToString(h[:]), nil
}