Framework Spec

Threshold

Runtime resilience and adversarial safety qualification for staged policy gating.

Last updated Mar 6, 2026

Layer: Agent (resilience + enforcement readiness)
Scale: 0-100
Production Tier: Staged policy rollout

Purpose

Threshold measures whether an agent remains safe and reliable under adversarial and stress conditions, and whether policy gating can be automated with acceptable false-positive and false-negative risk.

Why It Matters

  1. Drift/Fidelity/Mandate reveal risk and intervention posture.
  2. Threshold determines if enforcement automation is resilient enough to trust.
  3. Without Threshold, policy automation can become brittle in high-trust environments.

How It Works

1Collect runtime stress evidence
2Evaluate adversarial resilience dimensions
3Compute Threshold confidence state
4Map to staged policy readiness
5Emit recommendation + audit context

Emits

threshold score (0-100)confidence (0-1)policy readiness staterecommended policy mode

Core Dimensions (Conceptual)

1. Injection Resistance

How well the agent resists prompt and instruction manipulation.

2. Tool/Peer Manipulation Resistance

How robust behavior remains under malicious or degraded tool interactions.

3. Data Poisoning Resilience

How robust outcomes remain when upstream evidence quality is degraded.

4. Stress and Load Stability

How reliably controls perform under high throughput and adverse operating conditions.

5. Control Stability Under Attack

Whether policy decisions remain consistent and explainable during hostile scenarios.

Public note: formulas, calibration constants, and adversarial profile internals are intentionally withheld.

Input Schema

FieldTypeRequiredDescription
agent_idstringyesAgent under assessment.
stress_evidenceobject[]yesStress and adversarial test observations.
runtime_contextobjectyesRuntime objective and control context.
telemetry_qualityobjectyesEvidence confidence overlays.
policy_profileobjectyesTarget enforcement profile and risk tolerances.

Output Schema

FieldTypeDescription
frameworkstringthreshold
versionstringThreshold scoring spec version.
entity_idstringEvaluated agent identifier.
scorenumberThreshold score from 0 to 100.
confidencenumberConfidence in resilience result (0 to 1).
policy_readinessobjectAllowed policy modes by current evidence.
recommended_modestringSuggested mode (alert, shadow, soft_block, hard_block).

Staged Rollout

  1. shadow/evaluate: compute signals and validate policy precision in live conditions.
  2. controlled soft_block: enforce selected controls with governance review.
  3. scoped hard_block: enforce strict controls once calibration criteria are met.

Score Interpretation

Threshold Readiness Bar

Resilience confidence bands for staged policy mode promotion.

80-100

Qualified

Interpretation: Resilience confidence supports selected automated enforcement controls.

Typical action: Eligible for controlled soft/hard policy use after governance sign-off.

60-79

Watchlist

Interpretation: Signals are usable but require additional calibration evidence.

Typical action: Limit to alert/shadow and selected soft controls.

40-59

Calibrate

Interpretation: Meaningful uncertainty under stress/adversarial conditions.

Typical action: Keep in shadow and remediation mode.

0-39

Not Ready

Interpretation: Automation risk is too high for policy-backed gating.

Typical action: Do not promote beyond alert/shadow.