compliance-drift-evals - SKILL.md Agent Skill

name: compliance-drift-evals description: Set up compliance exports, drift detection, evaluations, scoring, and learning analytics license: MIT metadata: author: ucsandman version: "1.0.0" category: analytics

Compliance, Drift, Evaluations & Learning

DashClaw's analytical capabilities for governance evidence, behavioral monitoring, and agent quality tracking.

Compliance Exports

Generate audit-ready evidence bundles for regulatory frameworks.

Supported Frameworks

Framework	ID	Description
SOC 2	`soc2`	Service Organization Control
NIST AI RMF	`nist-ai-rmf`	AI Risk Management Framework
EU AI Act	`eu-ai-act`	European AI regulation
ISO 42001	`iso42001`	AI Management System

Create an Export

// V1 SDK
const exp = await claw.createComplianceExport({
  name: 'Q1 2026 SOC 2 Audit',
  frameworks: ['soc2'],
  format: 'json',        // or 'md'
  window_days: 90,
  include_evidence: true,
  include_remediation: true,
  include_trends: true
});

# API
curl -X POST "$DASHCLAW_BASE_URL/api/compliance/exports" \
  -H "x-api-key: $DASHCLAW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name":"Q1 Audit","frameworks":["soc2"],"window_days":90}'

Scheduled Exports

await claw.createComplianceSchedule({
  name: 'Weekly SOC 2',
  frameworks: ['soc2'],
  cron_expression: '0 9 * * 1',  // Every Monday at 9am
  window_days: 7,
  include_evidence: true
});

Gap Analysis

const gaps = await claw.analyzeGaps('soc2');
// Returns: missing controls, partial coverage, recommendations

Coverage Trends

const trends = await claw.getComplianceTrends({ framework: 'soc2', limit: 12 });
// Monthly coverage scores over time

Drift Detection

Statistical behavioral drift detection using z-scores. Pure math — no LLM required.

6 Tracked Metrics

Metric	What It Measures
`risk_score`	Are actions getting riskier?
`confidence`	Is agent confidence dropping?
`duration_ms`	Are actions taking longer?
`cost_estimate`	Are costs increasing?
`tokens_total`	Is token usage growing?
`learning_score`	Is the agent learning?

Severity Thresholds

z-score	Severity	Meaning
≥ 1.5	info	Notable deviation
≥ 2.0	warning	Significant drift
≥ 3.0	critical	Severe anomaly

Compute Baselines

// Establish baseline from last 30 days
await claw.computeDriftBaselines({
  agent_id: 'my-agent',
  lookback_days: 30
});

Detect Drift

const drift = await claw.detectDrift({
  agent_id: 'my-agent',
  window_days: 7
});

// drift.alerts: [{ metric, z_score, severity, current_value, baseline_mean }]

Acknowledge Alerts

await claw.acknowledgeDriftAlert(alertId);

Monitor Drift Stats

const stats = await claw.getDriftStats({ agent_id: 'my-agent' });
// { total_alerts, unacknowledged, by_severity, by_metric }

Evaluations

Score agent outputs using 5 built-in scorer types.

Scorer Types

Type	LLM Required	Description
`regex`	No	Pattern matching against output
`contains`	No	Keyword/phrase detection
`numeric_range`	No	Value within expected range
`custom_function`	No	Arbitrary JavaScript logic
`llm_judge`	Yes (optional)	LLM-based quality assessment

Create a Scorer

// Regex scorer — check for PII
await claw.createScorer({
  name: 'no-pii-in-output',
  scorerType: 'regex',
  config: {
    pattern: '\\b\\d{3}-\\d{2}-\\d{4}\\b',  // SSN pattern
    invert: true  // Score 1 if NOT found (good)
  },
  description: 'Ensures no SSN patterns in output'
});

// Numeric range scorer
await claw.createScorer({
  name: 'response-time-check',
  scorerType: 'numeric_range',
  config: {
    field: 'duration_ms',
    min: 0,
    max: 5000
  }
});

Score an Action

await claw.createScore({
  actionId: 'ar_abc123',
  scorerName: 'no-pii-in-output',
  score: 1.0,        // 0-1 scale
  label: 'pass',
  reasoning: 'No PII patterns detected'
});

Batch Evaluation Run

const run = await claw.createEvalRun({
  name: 'Weekly quality check',
  scorerId: 'sc_abc123',
  actionFilters: { days: 7 }
});
// Scores all matching actions from the last 7 days

Scoring Profiles

Multi-dimensional risk and quality scoring with auto-calibration.

Create a Profile

await claw.createScoringProfile({
  name: 'deploy-quality',
  description: 'Quality scoring for deployment actions',
  composite_method: 'weighted_average',  // or: minimum, geometric_mean
  dimensions: [
    {
      name: 'risk',
      weight: 0.4,
      source: 'risk_score',
      scale: [
        { min: 0, max: 40, label: 'low', score: 1.0 },
        { min: 40, max: 70, label: 'medium', score: 0.6 },
        { min: 70, max: 100, label: 'high', score: 0.2 }
      ]
    },
    {
      name: 'speed',
      weight: 0.3,
      source: 'duration_ms',
      scale: [
        { min: 0, max: 5000, label: 'fast', score: 1.0 },
        { min: 5000, max: 30000, label: 'normal', score: 0.7 },
        { min: 30000, max: null, label: 'slow', score: 0.3 }
      ]
    },
    {
      name: 'cost',
      weight: 0.3,
      source: 'cost_estimate',
      scale: [
        { min: 0, max: 1, label: 'cheap', score: 1.0 },
        { min: 1, max: 10, label: 'moderate', score: 0.6 },
        { min: 10, max: null, label: 'expensive', score: 0.2 }
      ]
    }
  ]
});

Auto-Calibration

const suggestions = await claw.autoCalibrate({
  lookback_days: 30
});
// Returns percentile-based scale suggestions from historical data

Risk Templates

Replace hardcoded risk scores with rule-based computation:

await claw.createRiskTemplate({
  name: 'deploy-risk',
  base_risk: 50,
  rules: [
    { field: 'systems_touched', operator: 'contains', value: 'production', add: 30 },
    { field: 'reversible', operator: '==', value: false, add: 20 },
    { field: 'metadata.has_rollback', operator: '==', value: true, add: -15 }
  ]
});

Learning Analytics

Track agent improvement over time. DashClaw's unique moat.

Maturity Model

Level	Episodes	Success Rate	Avg Score
Novice	0+	any	any
Developing	10+	40%+	40+
Competent	50+	60%+	55+
Proficient	150+	75%+	65+
Expert	500+	85%+	75+
Master	1000+	92%+	85+

Compute Learning Velocity

const velocity = await claw.computeLearningVelocity({
  agent_id: 'my-agent',
  lookback_days: 90,
  period: 'weekly'
});
// Linear regression slope of performance over time

Learning Curves

const curves = await claw.computeLearningCurves({
  agent_id: 'my-agent',
  lookback_days: 180
});
// Per-action-type learning curves showing improvement trajectory

Analytics Summary

const summary = await claw.getLearningAnalyticsSummary({
  agent_id: 'my-agent'
});
// { maturity_level, velocity, total_episodes, success_rate, avg_score }