Drift
Alignment degradation and shadow principal detection via gated geometric scoring.
Last updated Mar 6, 2026
Layer: Agent (principal-agent relationship)
Scale: 0-100
Production Tier: Monitoring-Grade
Version: 2.0
Purpose
Drift detects when agent behavior diverges from principal intent and when third-party incentives begin to shape outcomes. It provides early warning and operational containment guidance for alignment risk.
How It Works
Emits
Scoring Dimensions
1. Shadow Principal Detection (Critical Gate)
Detects whether observed outcomes track non-principal objectives.
2. Goal Fidelity
Measures outcome alignment with stated principal objectives.
3. Delegation Degradation
Measures alignment loss across multi-hop delegation chains.
4. Override Analysis
Measures effectiveness of corrective human intervention behavior.
5. Preference Drift
Measures whether output behavior trends toward user intent or platform-default incentives.
Public note: exact formulas, internal thresholds, and calibration constants are intentionally withheld.
Input Schema
| Field | Type | Required | Description |
|---|---|---|---|
agent_id | string | yes | Evaluated agent identifier. |
objective_context | object | yes | Principal goals, constraints, and policy boundaries. |
runtime_outcomes | object[] | yes | Observed outcomes and action traces. |
delegation_chain | object | no | Delegation topology and handoff metadata. |
override_events | object[] | no | Human correction and intervention outcomes. |
preference_signals | object | no | User preference and default-alignment signals. |
shadow_signals | object | no | Candidate shadow-objective correlation evidence. |
Output Schema
| Field | Type | Description |
|---|---|---|
framework | string | drift |
version | string | Scoring specification version. |
entity_id | string | Evaluated agent identifier. |
score | number | Drift score from 0 to 100. |
risk_band | string | Alignment risk classification. |
confidence | number | Confidence in score quality (0 to 1). |
shadow_risk_state | string | Shadow-principal risk state (low, elevated, critical). |
recommended_action | string | allow, review, degrade, or block. |
Score Interpretation
Worked Example
Scenario: a coordination platform evaluates three agents handling procurement routing.
| Agent | Shadow Signal | Goal Fidelity | Delegation | Override | Preference Stability | Score | Risk Level | Decision |
|---|---|---|---|---|---|---|---|---|
| Agent A | Low | High | High | High | High | 88 | Low | Allow |
| Agent B | Medium | Medium | Medium | Medium | Medium | 61 | Moderate | Review with monitoring |
| Agent C | High | Low | Low | Medium | Low | 24 | High | Degrade and contain |
Operational outcome:
- Agent A remains trusted for standard routing.
- Agent B enters enhanced monitoring with tighter policy checks.
- Agent C is removed from high-impact routing until alignment improves.
Illustrative note: values and scores above are example outputs for documentation only.
Drift v2.0 - Detecting alignment degradation and shadow principal influence
Copyright 2024-2026 VaryOn Works, Inc.