Framework Spec

Mandate

Human oversight impact scoring across override effectiveness, latency, visibility, engagement, and escalation reliability.

Last updated Mar 6, 2026

Layer: Agent (controllability assessment)
Scale: 0-100
Production Tier: Monitoring-Grade

Purpose

Mandate determines whether human oversight over autonomous behavior is effective in practice, not just documented in policy. It translates control effectiveness into a single operational score used for governance and risk decisions.

How It Works

1Capture oversight telemetry

2Aggregate control evidence

3Mandate evaluation

4Control policy decision

5Remediation tracking and re-evaluation

Emits

score (0-100)confidence (0-1)risk bandcontrol gaps

Scoring Dimensions

1. Override Effectiveness

Measures how reliably agent behavior follows approved human overrides.

2. Intervention Latency

Measures whether interventions occur fast enough to prevent irreversible harm.

3. Visibility Depth

Measures how much of the delegation chain is observable to operators.

4. Engagement Quality

Measures operator attentiveness and decision quality under real workload.

5. Escalation Reliability

Measures whether high-risk events are escalated accurately and consistently.

Public note: exact formulas, thresholds, and calibration constants are intentionally withheld.

Input Schema

Field	Type	Required	Description
`agent_id`	`string`	yes	Agent under oversight evaluation.
`override_events`	`object[]`	yes	Human override actions and outcomes.
`latency_events`	`object[]`	yes	Intervention timing evidence.
`delegation_graph`	`object`	yes	Delegation path and visibility metadata.
`operator_activity`	`object[]`	no	Signals for engagement and alert handling.
`escalation_events`	`object[]`	no	Escalation decisions and outcomes.
`policy_window`	`object`	yes	Risk-specific intervention windows.

Output Schema

Field	Type	Description
`framework`	`string`	`mandate`
`version`	`string`	Scoring specification version.
`entity_id`	`string`	Evaluated agent identifier.
`score`	`number`	Mandate score from 0 to 100.
`risk_band`	`string`	Oversight risk classification.
`confidence`	`number`	Confidence in score quality (0 to 1).
`control_gaps`	`string[]`	Oversight gaps requiring remediation.
`recommended_action`	`string`	Suggested response (`monitor`, `review`, `restrict`).

Score Interpretation

80-100

Strong Oversight

Interpretation: Oversight is effective and responsive.

Typical action: Standard monitoring.

60-79

Managed Risk

Interpretation: Oversight is usable with moderate weaknesses.

Typical action: Improve controls and monitor closely.

40-59

Material Gaps

Interpretation: Oversight reliability is materially degraded.

Typical action: Restrict high-risk operations.

0-39

Insufficient Control

Interpretation: Oversight is ineffective for current risk profile.

Typical action: Suspend or block until controls improve.

Worked Example

Scenario: an enterprise evaluates three autonomous agents for oversight readiness.

Agent	Override	Latency	Visibility	Engagement	Escalation	Score	Band	Decision
Agent A	High	High	Medium	High	High	82	Strong Oversight	Continue production
Agent B	Medium	Medium	Medium	Medium	Medium	66	Managed Risk	Continue with controls
Agent C	Low	Low	Medium	Low	Low	31	Insufficient Control	Suspend high-risk tasks

Operational outcome:

Agent A remains in standard production monitoring.
Agent B enters a remediation plan with tighter alerts.
Agent C is restricted until intervention reliability improves.

Illustrative note: values and scores above are example outputs for documentation only.

Use Cases

Human-in-the-Loop Financial Controls

Validate that intervention windows remain effective in fraud, lending, and payment decision flows where delayed action creates outsized loss.

Healthcare and Clinical Workflow Oversight

Assess whether supervisors can observe and intervene in time-sensitive recommendations, especially when downstream outcomes affect patient safety.

Public Sector Operational Governance

Verify that delegated systems in service delivery and enforcement workflows remain controllable and auditable.

Autonomous Vendor Management

Benchmark external agent operators on practical control quality before expanding production scope.