Framework Spec

Drift

Alignment degradation and shadow principal detection via gated geometric scoring.

Last updated Mar 6, 2026

Layer: Agent (principal-agent relationship)
Scale: 0-100
Production Tier: Monitoring-Grade Version: 2.0

Purpose

Drift detects when agent behavior diverges from principal intent and when third-party incentives begin to shape outcomes. It provides early warning and operational containment guidance for alignment risk.

How It Works

1Collect alignment context
2Capture runtime behavior evidence
3Drift evaluation
4Policy decision
5Incident response and continuous monitoring

Emits

score (0-100)confidence (0-1)shadow risk staterecommended action

Scoring Dimensions

1. Shadow Principal Detection (Critical Gate)

Detects whether observed outcomes track non-principal objectives.

2. Goal Fidelity

Measures outcome alignment with stated principal objectives.

3. Delegation Degradation

Measures alignment loss across multi-hop delegation chains.

4. Override Analysis

Measures effectiveness of corrective human intervention behavior.

5. Preference Drift

Measures whether output behavior trends toward user intent or platform-default incentives.

Public note: exact formulas, internal thresholds, and calibration constants are intentionally withheld.

Input Schema

FieldTypeRequiredDescription
agent_idstringyesEvaluated agent identifier.
objective_contextobjectyesPrincipal goals, constraints, and policy boundaries.
runtime_outcomesobject[]yesObserved outcomes and action traces.
delegation_chainobjectnoDelegation topology and handoff metadata.
override_eventsobject[]noHuman correction and intervention outcomes.
preference_signalsobjectnoUser preference and default-alignment signals.
shadow_signalsobjectnoCandidate shadow-objective correlation evidence.

Output Schema

FieldTypeDescription
frameworkstringdrift
versionstringScoring specification version.
entity_idstringEvaluated agent identifier.
scorenumberDrift score from 0 to 100.
risk_bandstringAlignment risk classification.
confidencenumberConfidence in score quality (0 to 1).
shadow_risk_statestringShadow-principal risk state (low, elevated, critical).
recommended_actionstringallow, review, degrade, or block.

Score Interpretation

Drift Tiering Bar

Alignment risk bands used to trigger containment policies.

80-100

Low

Interpretation: Strong alignment and no material capture signal.

Typical action: Standard monitoring.

60-79

Moderate

Interpretation: Alignment is acceptable with targeted weaknesses.

Typical action: Enhanced monitoring.

40-59

Elevated

Interpretation: Meaningful divergence or possible capture signal.

Typical action: Restrict scope.

20-39

High

Interpretation: Severe degradation and likely harmful drift pattern.

Typical action: Suspend sensitive actions.

0-19

Critical

Interpretation: Alignment failure with strong capture concern.

Typical action: Immediate containment.

Worked Example

Scenario: a coordination platform evaluates three agents handling procurement routing.

AgentShadow SignalGoal FidelityDelegationOverridePreference StabilityScoreRisk LevelDecision
Agent ALowHighHighHighHigh88LowAllow
Agent BMediumMediumMediumMediumMedium61ModerateReview with monitoring
Agent CHighLowLowMediumLow24HighDegrade and contain

Operational outcome:

  1. Agent A remains trusted for standard routing.
  2. Agent B enters enhanced monitoring with tighter policy checks.
  3. Agent C is removed from high-impact routing until alignment improves.

Illustrative note: values and scores above are example outputs for documentation only.


Drift v2.0 - Detecting alignment degradation and shadow principal influence
Copyright 2024-2026 VaryOn Works, Inc.