Phase 2 enhancements to the policy decider, plus listing-level ETags
that benefit every deployment regardless of decider mode.
Reference Rego policy
---------------------
internal/policy/rego/access.rego mirrors InternalDecider's semantics
exactly — bottom-up walk, deny-first within a level, default-deny when
HasAnyFile=true, glob matching with @-boundary semantics (special-cased
bare "*" because OPA's glob.match treats empty delimiters
inconsistently for that pattern).
Embedded into the binary via go:embed; --print-rego dumps it to stdout
so federal customers standing up an external OPA can use it as a
parity-tested baseline:
zddc-server --print-rego > /etc/opa/policies/zddc-access.rego
Parity test runner
------------------
parity_test.go imports the OPA Go module as a TEST-ONLY dependency
(github.com/open-policy-agent/opa@v0.70.0). Every fixture from the
internal Go evaluator's test set runs through both implementations;
any divergence fails CI. The test-only import means production
binaries (built by `go build ./cmd/zddc-server`) stay OPA-free —
release-flag binary size unchanged at ~13 MB.
The parity test caught a real bug on first run: bare "*" patterns
didn't match through OPA's glob.match with empty delimiters. Fixed
in access.rego with a special-case rule. This is exactly the kind of
subtle drift the parity guard exists to catch.
External-mode decision cache
----------------------------
HTTPDecider is now wrapped in a cachingDecider with a default 1s TTL.
Bursty patterns like .archive listings (one OPA round-trip per entry
before, one per (email, decision-input) tuple per TTL window after)
amortize cleanly. Verified: 20 identical /D/ requests produce 1 OPA
hit with cache, 40 hits without (each listing makes 2 ACL queries).
ZDDC_OPA_CACHE_TTL knob (default 1s) lets operators tune. 0 disables.
1s matches the fsnotify watcher debounce window — staleness is
bounded the same way other policy-edit propagation already is.
Internal mode unchanged; the in-process Go evaluator is already
cheaper than a cache lookup would be.
Listing ETags
-------------
GET / (project list) and GET /<dir>/ (directory listing JSON) now
carry content-hash ETag + Cache-Control: private, max-age=0,
must-revalidate. SHA-256 of the rendered JSON, truncated to 16 hex
chars (64 bits — collision risk on a listing of any realistic size
is vanishingly small).
Server-side caching deliberately not added: it would require
mtime-based invalidation, and Azure Files SMB mounts (a common
deployment substrate) don't support fsnotify reliably. The
content-hash ETag delivers the bandwidth savings (304 on identical
fetches) without depending on watcher correctness — the hash is the
actual response, so it can't lie about staleness regardless of
underlying watcher behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
322 lines
11 KiB
Go
322 lines
11 KiB
Go
// Package policy is the access-decision boundary for zddc-server.
|
|
//
|
|
// All ACL checks in handlers go through Decider.Allow rather than
|
|
// calling zddc.AllowedWithChain directly. This lets a deployment
|
|
// route policy decisions to an external OPA-compatible server
|
|
// (for federal customers running their own audited Rego policies)
|
|
// without changing handler code.
|
|
//
|
|
// Two implementations:
|
|
//
|
|
// - InternalDecider — wraps zddc.AllowedWithChain. The default;
|
|
// no new dependencies, identical semantics to the legacy code
|
|
// path. This is what the docs in zddc/README.md describe.
|
|
//
|
|
// - HTTPDecider — POSTs to OPA's canonical /v1/data/<package>/allow
|
|
// endpoint over HTTP or a Unix-domain socket. Federal customers
|
|
// deploy real OPA alongside zddc-server, write their own Rego,
|
|
// and point ZDDC_OPA_URL at it.
|
|
//
|
|
// Configuration knob:
|
|
//
|
|
// ZDDC_OPA_URL= # internal (default)
|
|
// ZDDC_OPA_URL=internal # internal (explicit)
|
|
// ZDDC_OPA_URL=http://127.0.0.1:8181 # external HTTP
|
|
// ZDDC_OPA_URL=https://opa.example:8181 # external HTTPS
|
|
// ZDDC_OPA_URL=unix:///run/opa/opa.sock # external Unix socket
|
|
//
|
|
// Failure mode (external only): unreachable / non-2xx / malformed
|
|
// response → fail closed (deny), with a WARN log. Operators who
|
|
// prefer availability over correctness can set ZDDC_OPA_FAIL_OPEN=1
|
|
// to flip to fail-open with a WARN log instead.
|
|
package policy
|
|
|
|
import (
|
|
"bytes"
|
|
"context"
|
|
"crypto/sha256"
|
|
"encoding/hex"
|
|
"encoding/json"
|
|
"errors"
|
|
"fmt"
|
|
"io"
|
|
"log/slog"
|
|
"net"
|
|
"net/http"
|
|
"net/url"
|
|
"strings"
|
|
"sync"
|
|
"time"
|
|
|
|
"codeberg.org/VARASYS/ZDDC/zddc/internal/zddc"
|
|
)
|
|
|
|
// AllowInput is the canonical input shape for Decider.Allow. It
|
|
// matches OPA's input convention: a JSON object passed as the
|
|
// "input" field of a /v1/data/<package>/allow query.
|
|
//
|
|
// External Rego policies can:
|
|
// - read input.user.email (string)
|
|
// - read input.path (string)
|
|
// - walk input.policy_chain.levels[].acl.{allow,deny} for
|
|
// custom cascade semantics, or read the pre-resolved
|
|
// input.policy_chain.has_any_file when implementing the
|
|
// same default-deny rule we use internally.
|
|
type AllowInput struct {
|
|
User struct {
|
|
Email string `json:"email"`
|
|
} `json:"user"`
|
|
Path string `json:"path"`
|
|
PolicyChain *SerializableChain `json:"policy_chain,omitempty"`
|
|
}
|
|
|
|
// SerializableChain is a JSON-friendly view of zddc.PolicyChain.
|
|
// We don't tag zddc.PolicyChain itself because it's tightly coupled
|
|
// to the parser; the duplication is one struct.
|
|
type SerializableChain struct {
|
|
Levels []zddc.ZddcFile `json:"levels"`
|
|
HasAnyFile bool `json:"has_any_file"`
|
|
}
|
|
|
|
func chainToSerializable(c zddc.PolicyChain) *SerializableChain {
|
|
return &SerializableChain{Levels: c.Levels, HasAnyFile: c.HasAnyFile}
|
|
}
|
|
|
|
// Decider is the access-decision interface every handler uses.
|
|
type Decider interface {
|
|
Allow(ctx context.Context, input AllowInput) (bool, error)
|
|
}
|
|
|
|
// Config selects and parameterizes the decider.
|
|
type Config struct {
|
|
URL string // raw value: "", "internal", "http(s)://...", "unix:///path"
|
|
FailOpen bool // external mode only: on transport error, allow instead of deny
|
|
CacheTTL time.Duration // external mode only: per-decision cache TTL. Zero = default 1s. Negative = no cache.
|
|
}
|
|
|
|
// New constructs a Decider per cfg.URL semantics.
|
|
// - "" or "internal" → InternalDecider (no cache — the in-process
|
|
// evaluator is already cheaper than a cache lookup would be)
|
|
// - "http(s)://..." → HTTPDecider wrapped in a small per-decision
|
|
// cache (default 1s TTL — short enough that staleness is bounded
|
|
// to the same window as fsnotify-debounced index refresh, long
|
|
// enough to amortize bursty listings like .archive enumeration
|
|
// into one OPA round-trip per (email, decision-input))
|
|
// - "unix:///..." → same as http(s), over a Unix socket
|
|
//
|
|
// Returns an error if URL is unrecognized.
|
|
func New(cfg Config) (Decider, error) {
|
|
if cfg.URL == "" || strings.EqualFold(cfg.URL, "internal") {
|
|
return &InternalDecider{}, nil
|
|
}
|
|
var inner Decider
|
|
var err error
|
|
switch {
|
|
case strings.HasPrefix(cfg.URL, "http://"), strings.HasPrefix(cfg.URL, "https://"):
|
|
inner, err = newHTTPDecider(cfg.URL, cfg.FailOpen, nil)
|
|
case strings.HasPrefix(cfg.URL, "unix://"):
|
|
path := strings.TrimPrefix(cfg.URL, "unix://")
|
|
dialer := &net.Dialer{Timeout: 2 * time.Second}
|
|
transport := &http.Transport{
|
|
DialContext: func(ctx context.Context, _, _ string) (net.Conn, error) {
|
|
return dialer.DialContext(ctx, "unix", path)
|
|
},
|
|
}
|
|
inner, err = newHTTPDecider("http://opa-unix-socket", cfg.FailOpen, transport)
|
|
default:
|
|
return nil, fmt.Errorf("unrecognized ZDDC_OPA_URL %q (want \"internal\", http(s)://..., or unix:///...)", cfg.URL)
|
|
}
|
|
if err != nil {
|
|
return nil, err
|
|
}
|
|
ttl := cfg.CacheTTL
|
|
if ttl == 0 {
|
|
ttl = time.Second
|
|
}
|
|
if ttl < 0 {
|
|
// Negative TTL = caching disabled (test seam).
|
|
return inner, nil
|
|
}
|
|
return &cachingDecider{inner: inner, ttl: ttl}, nil
|
|
}
|
|
|
|
// InternalDecider routes Allow through zddc.AllowedWithChain. No
|
|
// network, no Rego, no new dependencies — same Go evaluator the
|
|
// existing test suite covers.
|
|
type InternalDecider struct{}
|
|
|
|
func (d *InternalDecider) Allow(_ context.Context, input AllowInput) (bool, error) {
|
|
chain := zddc.PolicyChain{}
|
|
if input.PolicyChain != nil {
|
|
chain.Levels = input.PolicyChain.Levels
|
|
chain.HasAnyFile = input.PolicyChain.HasAnyFile
|
|
}
|
|
return zddc.AllowedWithChain(chain, input.User.Email), nil
|
|
}
|
|
|
|
// HTTPDecider POSTs to /v1/data/zddc/access/allow on the configured
|
|
// endpoint. Spec:
|
|
// - request body {"input": <AllowInput>}
|
|
// - response body {"result": true|false}
|
|
// - 5-second per-request timeout
|
|
// - non-2xx, transport error, missing/malformed result → policy
|
|
// decision is "deny" unless FailOpen=true
|
|
//
|
|
// The path "/v1/data/zddc/access/allow" is the OPA convention; the
|
|
// "zddc.access" Rego package on an external server would expose
|
|
// `allow` for this endpoint.
|
|
type HTTPDecider struct {
|
|
endpoint string
|
|
client *http.Client
|
|
failOpen bool
|
|
}
|
|
|
|
func newHTTPDecider(endpoint string, failOpen bool, transport http.RoundTripper) (*HTTPDecider, error) {
|
|
if _, err := url.Parse(endpoint); err != nil {
|
|
return nil, fmt.Errorf("invalid OPA URL %q: %w", endpoint, err)
|
|
}
|
|
c := &http.Client{Timeout: 5 * time.Second}
|
|
if transport != nil {
|
|
c.Transport = transport
|
|
}
|
|
return &HTTPDecider{
|
|
endpoint: strings.TrimRight(endpoint, "/") + "/v1/data/zddc/access/allow",
|
|
client: c,
|
|
failOpen: failOpen,
|
|
}, nil
|
|
}
|
|
|
|
type opaResponse struct {
|
|
Result *bool `json:"result"`
|
|
}
|
|
|
|
func (d *HTTPDecider) Allow(ctx context.Context, input AllowInput) (bool, error) {
|
|
body, err := json.Marshal(struct {
|
|
Input AllowInput `json:"input"`
|
|
}{Input: input})
|
|
if err != nil {
|
|
return d.failResult(fmt.Errorf("marshal input: %w", err))
|
|
}
|
|
req, err := http.NewRequestWithContext(ctx, http.MethodPost, d.endpoint, bytes.NewReader(body))
|
|
if err != nil {
|
|
return d.failResult(fmt.Errorf("build request: %w", err))
|
|
}
|
|
req.Header.Set("Content-Type", "application/json")
|
|
req.Header.Set("Accept", "application/json")
|
|
|
|
resp, err := d.client.Do(req)
|
|
if err != nil {
|
|
return d.failResult(fmt.Errorf("opa request: %w", err))
|
|
}
|
|
defer resp.Body.Close()
|
|
|
|
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
|
// Read up to 512 bytes of the error body for the log without
|
|
// blowing up on a verbose OPA error page.
|
|
snippet, _ := io.ReadAll(io.LimitReader(resp.Body, 512))
|
|
return d.failResult(fmt.Errorf("opa returned %d: %s", resp.StatusCode, strings.TrimSpace(string(snippet))))
|
|
}
|
|
var parsed opaResponse
|
|
if err := json.NewDecoder(resp.Body).Decode(&parsed); err != nil {
|
|
return d.failResult(fmt.Errorf("decode opa response: %w", err))
|
|
}
|
|
if parsed.Result == nil {
|
|
return d.failResult(errors.New("opa response missing 'result' field"))
|
|
}
|
|
return *parsed.Result, nil
|
|
}
|
|
|
|
// failResult logs the failure and returns the configured fail-mode
|
|
// decision. Logged at WARN so a healthy run is silent but a sick OPA
|
|
// is loud.
|
|
func (d *HTTPDecider) failResult(err error) (bool, error) {
|
|
if d.failOpen {
|
|
slog.Warn("policy decision failed; failing open (allow)", "endpoint", d.endpoint, "err", err)
|
|
return true, nil
|
|
}
|
|
slog.Warn("policy decision failed; failing closed (deny)", "endpoint", d.endpoint, "err", err)
|
|
return false, nil
|
|
}
|
|
|
|
// AllowFromChain is a convenience for callers that already have a
|
|
// PolicyChain in hand. Equivalent to constructing AllowInput manually
|
|
// from (chain, email, path) and calling d.Allow.
|
|
func AllowFromChain(ctx context.Context, d Decider, chain zddc.PolicyChain, email, path string) (bool, error) {
|
|
in := AllowInput{Path: path, PolicyChain: chainToSerializable(chain)}
|
|
in.User.Email = email
|
|
return d.Allow(ctx, in)
|
|
}
|
|
|
|
// cachingDecider wraps another Decider with a small per-decision cache.
|
|
// Designed for the external-OPA hot path: a single .archive listing or
|
|
// directory enumeration can hit the same (email, dir-policy) tuple
|
|
// dozens of times in milliseconds, and a remote OPA round-trip per
|
|
// query would dominate latency. The 1s default TTL bounds staleness to
|
|
// the same window as the fsnotify watcher's debounce, so a `.zddc` edit
|
|
// is reflected in the next listing rather than carried over indefinitely.
|
|
//
|
|
// Key shape: SHA-256 of the canonical JSON-serialized AllowInput. This
|
|
// makes the cache safe across all input variations (different paths,
|
|
// different chains, different users) without us having to enumerate
|
|
// the dimensions.
|
|
type cachingDecider struct {
|
|
inner Decider
|
|
ttl time.Duration
|
|
|
|
mu sync.Mutex
|
|
entries map[string]cacheEntry
|
|
}
|
|
|
|
type cacheEntry struct {
|
|
expires time.Time
|
|
allow bool
|
|
}
|
|
|
|
func (d *cachingDecider) Allow(ctx context.Context, input AllowInput) (bool, error) {
|
|
key, err := cacheKey(input)
|
|
if err != nil {
|
|
// Couldn't key — fall through to inner without caching. Should
|
|
// never happen in practice; AllowInput marshals as plain JSON.
|
|
return d.inner.Allow(ctx, input)
|
|
}
|
|
|
|
now := time.Now()
|
|
d.mu.Lock()
|
|
if d.entries == nil {
|
|
d.entries = make(map[string]cacheEntry)
|
|
}
|
|
if e, ok := d.entries[key]; ok && now.Before(e.expires) {
|
|
d.mu.Unlock()
|
|
return e.allow, nil
|
|
}
|
|
d.mu.Unlock()
|
|
|
|
allow, err := d.inner.Allow(ctx, input)
|
|
if err != nil {
|
|
return allow, err
|
|
}
|
|
|
|
d.mu.Lock()
|
|
// Best-effort eviction of expired entries — keeps the map from
|
|
// growing unbounded under high cardinality. O(n) but capped to
|
|
// occasional sweeps; fine for this scale.
|
|
if len(d.entries) > 4096 {
|
|
for k, e := range d.entries {
|
|
if now.After(e.expires) {
|
|
delete(d.entries, k)
|
|
}
|
|
}
|
|
}
|
|
d.entries[key] = cacheEntry{expires: now.Add(d.ttl), allow: allow}
|
|
d.mu.Unlock()
|
|
return allow, nil
|
|
}
|
|
|
|
func cacheKey(input AllowInput) (string, error) {
|
|
b, err := json.Marshal(input)
|
|
if err != nil {
|
|
return "", err
|
|
}
|
|
h := sha256.Sum256(b)
|
|
return hex.EncodeToString(h[:]), nil
|
|
}
|