Mandate
Human oversight impact scoring across override effectiveness, latency, visibility, engagement, and escalation reliability.
Last updated Mar 6, 2026
Layer: Agent (controllability assessment)
Scale: 0-100
Production Tier: Monitoring-Grade
Purpose
Mandate determines whether human oversight over autonomous behavior is effective in practice, not just documented in policy. It translates control effectiveness into a single operational score used for governance and risk decisions.
How It Works
Emits
Scoring Dimensions
1. Override Effectiveness
Measures how reliably agent behavior follows approved human overrides.
2. Intervention Latency
Measures whether interventions occur fast enough to prevent irreversible harm.
3. Visibility Depth
Measures how much of the delegation chain is observable to operators.
4. Engagement Quality
Measures operator attentiveness and decision quality under real workload.
5. Escalation Reliability
Measures whether high-risk events are escalated accurately and consistently.
Public note: exact formulas, thresholds, and calibration constants are intentionally withheld.
Input Schema
| Field | Type | Required | Description |
|---|---|---|---|
agent_id | string | yes | Agent under oversight evaluation. |
override_events | object[] | yes | Human override actions and outcomes. |
latency_events | object[] | yes | Intervention timing evidence. |
delegation_graph | object | yes | Delegation path and visibility metadata. |
operator_activity | object[] | no | Signals for engagement and alert handling. |
escalation_events | object[] | no | Escalation decisions and outcomes. |
policy_window | object | yes | Risk-specific intervention windows. |
Output Schema
| Field | Type | Description |
|---|---|---|
framework | string | mandate |
version | string | Scoring specification version. |
entity_id | string | Evaluated agent identifier. |
score | number | Mandate score from 0 to 100. |
risk_band | string | Oversight risk classification. |
confidence | number | Confidence in score quality (0 to 1). |
control_gaps | string[] | Oversight gaps requiring remediation. |
recommended_action | string | Suggested response (monitor, review, restrict). |
Score Interpretation
Worked Example
Scenario: an enterprise evaluates three autonomous agents for oversight readiness.
| Agent | Override | Latency | Visibility | Engagement | Escalation | Score | Band | Decision |
|---|---|---|---|---|---|---|---|---|
| Agent A | High | High | Medium | High | High | 82 | Strong Oversight | Continue production |
| Agent B | Medium | Medium | Medium | Medium | Medium | 66 | Managed Risk | Continue with controls |
| Agent C | Low | Low | Medium | Low | Low | 31 | Insufficient Control | Suspend high-risk tasks |
Operational outcome:
- Agent A remains in standard production monitoring.
- Agent B enters a remediation plan with tighter alerts.
- Agent C is restricted until intervention reliability improves.
Illustrative note: values and scores above are example outputs for documentation only.
Use Cases
Human-in-the-Loop Financial Controls
Validate that intervention windows remain effective in fraud, lending, and payment decision flows where delayed action creates outsized loss.
Healthcare and Clinical Workflow Oversight
Assess whether supervisors can observe and intervene in time-sensitive recommendations, especially when downstream outcomes affect patient safety.
Public Sector Operational Governance
Verify that delegated systems in service delivery and enforcement workflows remain controllable and auditable.
Autonomous Vendor Management
Benchmark external agent operators on practical control quality before expanding production scope.