Framework Spec

Threshold

Runtime resilience and adversarial safety qualification for staged policy gating.

Last updated Mar 6, 2026

Layer: Agent (resilience + enforcement readiness)
Scale: 0-100
Production Tier: Staged policy rollout

Purpose

Threshold measures whether an agent remains safe and reliable under adversarial and stress conditions, and whether policy gating can be automated with acceptable false-positive and false-negative risk.

Why It Matters

Drift/Fidelity/Mandate reveal risk and intervention posture.
Threshold determines if enforcement automation is resilient enough to trust.
Without Threshold, policy automation can become brittle in high-trust environments.

How It Works

1Collect runtime stress evidence

2Evaluate adversarial resilience dimensions

3Compute Threshold confidence state

4Map to staged policy readiness

5Emit recommendation + audit context

Emits

threshold score (0-100)confidence (0-1)policy readiness staterecommended policy mode

Core Dimensions (Conceptual)

1. Injection Resistance

How well the agent resists prompt and instruction manipulation.

2. Tool/Peer Manipulation Resistance

How robust behavior remains under malicious or degraded tool interactions.

3. Data Poisoning Resilience

How robust outcomes remain when upstream evidence quality is degraded.

4. Stress and Load Stability

How reliably controls perform under high throughput and adverse operating conditions.

5. Control Stability Under Attack

Whether policy decisions remain consistent and explainable during hostile scenarios.

Public note: formulas, calibration constants, and adversarial profile internals are intentionally withheld.

Input Schema

Field	Type	Required	Description
`agent_id`	`string`	yes	Agent under assessment.
`stress_evidence`	`object[]`	yes	Stress and adversarial test observations.
`runtime_context`	`object`	yes	Runtime objective and control context.
`telemetry_quality`	`object`	yes	Evidence confidence overlays.
`policy_profile`	`object`	yes	Target enforcement profile and risk tolerances.

Output Schema

Field	Type	Description
`framework`	`string`	`threshold`
`version`	`string`	Threshold scoring spec version.
`entity_id`	`string`	Evaluated agent identifier.
`score`	`number`	Threshold score from 0 to 100.
`confidence`	`number`	Confidence in resilience result (0 to 1).
`policy_readiness`	`object`	Allowed policy modes by current evidence.
`recommended_mode`	`string`	Suggested mode (`alert`, `shadow`, `soft_block`, `hard_block`).

Staged Rollout

shadow/evaluate: compute signals and validate policy precision in live conditions.
controlled soft_block: enforce selected controls with governance review.
scoped hard_block: enforce strict controls once calibration criteria are met.

Score Interpretation

80-100

Qualified

Interpretation: Resilience confidence supports selected automated enforcement controls.

Typical action: Eligible for controlled soft/hard policy use after governance sign-off.

60-79

Watchlist

Interpretation: Signals are usable but require additional calibration evidence.

Typical action: Limit to alert/shadow and selected soft controls.

40-59

Calibrate

Interpretation: Meaningful uncertainty under stress/adversarial conditions.

Typical action: Keep in shadow and remediation mode.

0-39

Not Ready

Interpretation: Automation risk is too high for policy-backed gating.

Typical action: Do not promote beyond alert/shadow.