The Problem with Boolean Trust

In a previous post, I proposed a boolean trust tag to mark degraded outputs and propagate taint through agent chains. A commenter, Theo, pointed out the flaw within a day: trust isn't a boolean. It isn't even a scalar.

Consider two downstream steps consuming the same upstream result:

  • A summarization step tolerates a weaker model but must not run on stale data.
  • A price calculation needs current data but can handle a slightly weaker model.

The upstream result came from a fallback model reading a 2-hour-old cache. It's degraded on both capability (weaker model) and freshness (old cache). What's your single trust score?

  • Low scalar: The summarization step over-rejects — it would have been fine with the weaker model, but the scalar says "degraded" so it bails.
  • High scalar: The price calculation under-rejects — it acts on stale data because the scalar averaged the freshness problem into an acceptable number.

There is no single threshold that works for both consumers. A scalar forces every consumer to share one definition of "trustworthy," which doesn't exist. As Theo noted: collapse the vector to one number and you destroy exactly the information the consumer needs to make its own decision.

This isn't just my comment section — the field is converging. The recent TrustBench framework makes the same move: keep dimensional scores per trust aspect and weight them per domain. Healthcare prioritizes citation validity and recency; finance prioritizes calculation and compliance. When several people reach for the same structure from different directions, it's usually because the structure is real.

Trust is a Vector; Provenance is What You Propagate

The reframe starts with a vocabulary correction: trust is not a property of a value. It's a judgment a consumer makes about a value. What the value carries is provenance — the typed record of how it came to be: which model produced it, how fresh its inputs were, which tools ran, what got degraded and along which axis. Trust is what each consumer computes from that provenance under its own policy.

So you don't propagate a degraded flag. You propagate a typed vector where each axis degrades independently:

from dataclasses import dataclass, field
from enum import Enum

class Axis(str, Enum):
    FRESHNESS = "freshness"
    CAPABILITY = "capability"
    TOOL = "tool"
    VERIFICATION = "verification"

@dataclass
class Provenance:
    axes: dict[Axis, float] = field(default_factory=lambda: {a: 1.0 for a in Axis})
    tainted_by: dict[Axis, set[str]] = field(default_factory=lambda: {a: set() for a in Axis})

    def merge(self, *upstreams: "Provenance") -> "Provenance":
        out = Provenance()
        for axis in Axis:
            out.axes[axis] = min([self.axes[axis]] + [u.axes[axis] for u in upstreams])
            out.tainted_by[axis] = set(self.tainted_by[axis])
            for u in upstreams:
                out.tainted_by[axis] |= u.tainted_by[axis]
        return out

The min is doing real work. The original taint-as-boolean answered "is anything degraded?" — a single OR across the chain. The vector answers "what kind of degradation is this output carrying, and how much, per axis?" Taking the minimum per axis rather than averaging prevents a serious freshness problem from being washed out by three fine capability scores.

The Gate is Per-Consumer, Not Global

The irreversibility gate becomes a policy that lives at each consumer:

@dataclass
class Policy:
    floors: dict[Axis, float]

    def admits(self, p: Provenance) -> bool:
        return all(p.axes[a] >= floor for a, floor in self.floors.items())

SUMMARIZE = Policy(floors={Axis.FRESHNESS: 0.9, Axis.CAPABILITY: 0.3})
PRICE_CALC = Policy(floors={Axis.FRESHNESS: 0.95, Axis.CAPABILITY: 0.6, Axis.VERIFICATION: 0.8})

def gate(action_policy: Policy, p: Provenance):
    if action_policy.admits(p):
        return "proceed"
    failed = [a for a, f in action_policy.floors.items() if p.axes[a] < f]
    if Axis.FRESHNESS in failed:
        return "refetch"
    if Axis.CAPABILITY in failed:
        return "re-run-on-primary"
    return "escalate-to-human"

The same upstream provenance vector flows to both consumers, and they reach different, individually correct decisions. The summarizer proceeds; the price calc refetches. One global score could never do that. The failed axis tells you how to recover, which a boolean never could.

This also absorbs a point from another commenter, Manuel: he argued the tag should be an enum, not a bool — skipped-tool vs stale-data vs retry-budget-exhausted route differently. The vector is the generalization: an enum is a vector with one axis active; the full structure lets multiple axes degrade at once, which is the real production case.

How Many Axes Before It Stops Being Worth It?

A vector with 40 axes is just a scalar's opposite failure. My current answer: start with the axes that map to your actual degradation sources, and no more. If your system has exactly two ways to degrade — fallback model and stale cache — you have two axes (capability, freshness). Add verification when you have a re-check step. Add tool when a tool can half-succeed. The axis count should equal the number of distinct things that can independently go wrong. If two axes always move together, they're one axis.

The sweet spot is the smallest set where each axis maps to a different recovery action. Freshness → refetch. Capability → re-run on primary. Verification → escalate. If two axes trigger the same recovery, collapse them.

Practical Toolkit from the Comments

  • Admission control (Dan): Before the agent fans out, decide if the whole task can afford to run. Separate provider quota, account quota, task budget, and ledger. The ledger is the same record as provenance: "this run cost 47 calls, 12 on the fallback tier" is both your bill and your capability-axis score.
  • Validation at consumption (James): Don't validate on the fresh-call path and trust the cache; validate when a value is used. Closes the laundering loophole at the consumer.
  • Time-bound by causality (HARD IN SOFT OUT): Don't reset taint after N seconds. Clear an axis when nothing on the live path still derives from the degraded step.
  • Poor-man's version (TuanAnhNguyen): Have any tool that acts on a stale-readable input append one line to a log, and grep it before anything irreversible. It's the 5%-effort version of the provenance vector.
  • Distributed correction (Abdullah): Under serverless fan-out, the limiter must live outside workers. TPM saturates before RPM on long-context agents, and "fallback to a cheaper model" is fiction if it draws from the same pooled tier.

The Parable

A commenter (HARD IN SOFT OUT) left this:

The agent hit a rate limit. It fell back to a cached answer from last Tuesday. The world changed on Wednesday. The agent kept working. The logs said "cache hit, 200 OK." The user got a message: "Your order has shipped." The warehouse's API key expired on Thursday.

Every hop green. Every log a 200. And a real package never ships. A scalar trust score on that final "order shipped" output would read fine — the last call succeeded. A provenance vector reads freshness: 0.1, tainted_by: {warehouse_check} and the shipping gate refuses to fire. That's the entire difference between uptime and correct uptime.