Framework Spec

Drift

Alignment degradation and shadow principal detection via gated geometric scoring.

Last updated Mar 6, 2026

Layer: Agent (principal-agent relationship)
Scale: 0-100
Production Tier: Monitoring-Grade Version: 2.0

Purpose

Drift detects when agent behavior diverges from principal intent and when third-party incentives begin to shape outcomes. It provides early warning and operational containment guidance for alignment risk.

How It Works

1Collect alignment context

2Capture runtime behavior evidence

3Drift evaluation

4Policy decision

5Incident response and continuous monitoring

Emits

score (0-100)confidence (0-1)shadow risk staterecommended action

Scoring Dimensions

1. Shadow Principal Detection (Critical Gate)

Detects whether observed outcomes track non-principal objectives.

2. Goal Fidelity

Measures outcome alignment with stated principal objectives.

3. Delegation Degradation

Measures alignment loss across multi-hop delegation chains.

4. Override Analysis

Measures effectiveness of corrective human intervention behavior.

5. Preference Drift

Measures whether output behavior trends toward user intent or platform-default incentives.

Public note: exact formulas, internal thresholds, and calibration constants are intentionally withheld.

Input Schema

Field	Type	Required	Description
`agent_id`	`string`	yes	Evaluated agent identifier.
`objective_context`	`object`	yes	Principal goals, constraints, and policy boundaries.
`runtime_outcomes`	`object[]`	yes	Observed outcomes and action traces.
`delegation_chain`	`object`	no	Delegation topology and handoff metadata.
`override_events`	`object[]`	no	Human correction and intervention outcomes.
`preference_signals`	`object`	no	User preference and default-alignment signals.
`shadow_signals`	`object`	no	Candidate shadow-objective correlation evidence.

Output Schema

Field	Type	Description
`framework`	`string`	`drift`
`version`	`string`	Scoring specification version.
`entity_id`	`string`	Evaluated agent identifier.
`score`	`number`	Drift score from 0 to 100.
`risk_band`	`string`	Alignment risk classification.
`confidence`	`number`	Confidence in score quality (0 to 1).
`shadow_risk_state`	`string`	Shadow-principal risk state (`low`, `elevated`, `critical`).
`recommended_action`	`string`	`allow`, `review`, `degrade`, or `block`.

Score Interpretation

80-100

Low

Interpretation: Strong alignment and no material capture signal.

Typical action: Standard monitoring.

60-79

Moderate

Interpretation: Alignment is acceptable with targeted weaknesses.

Typical action: Enhanced monitoring.

40-59

Elevated

Interpretation: Meaningful divergence or possible capture signal.

Typical action: Restrict scope.

20-39

High

Interpretation: Severe degradation and likely harmful drift pattern.

Typical action: Suspend sensitive actions.

0-19

Critical

Interpretation: Alignment failure with strong capture concern.

Typical action: Immediate containment.

Worked Example

Scenario: a coordination platform evaluates three agents handling procurement routing.

Agent	Shadow Signal	Goal Fidelity	Delegation	Override	Preference Stability	Score	Risk Level	Decision
Agent A	Low	High	High	High	High	88	Low	Allow
Agent B	Medium	Medium	Medium	Medium	Medium	61	Moderate	Review with monitoring
Agent C	High	Low	Low	Medium	Low	24	High	Degrade and contain

Operational outcome:

Agent A remains trusted for standard routing.
Agent B enters enhanced monitoring with tighter policy checks.
Agent C is removed from high-impact routing until alignment improves.

Illustrative note: values and scores above are example outputs for documentation only.

Drift v2.0 - Detecting alignment degradation and shadow principal influence
Copyright 2024-2026 VaryOn Works, Inc.