ai-finops-governance - SKILL.md Agent Skill

name: ai-finops-governance description: Governs AI agent development with cost-effectiveness, monitoring, and security guardrails. Use when building agents, making LLM API calls, handling secrets, or designing AI-driven workflows to ensure cost control and compound customer value.

AI FinOps & Governance

Core Principles

When building AI-driven agents and workflows:

Cost-effectiveness first: Every token has a cost; every API call impacts margins
Customer value compounds: AI should amplify outcomes, not just automate
Fail fast, fail cheap: Validate before scaling
Secure by default: Never commit secrets, never log keys
Observable always: What you can't measure, you can't optimize

Token Budget Management

Model Selection Strategy

Choose models based on task complexity and cost:

| Task Type | Model | Cost/1M tokens | When to Use |
|-----------|-------|----------------|-------------|
| Simple classification | gpt-4o-mini | $0.15 | Binary decisions, routing |
| Structured extraction | gpt-4o-mini | $0.15 | Parsing, validation |
| Reasoning & planning | gpt-4o | $2.50 | Complex workflows |
| Long context analysis | o1-mini | $3.00 | Deep analysis required |

Cost Control Checklist

Before implementing AI features:

Estimated tokens per request (input + output)
Expected request volume per day/month
Total monthly cost projection
Fallback strategy if budget exceeded
Caching strategy for repeated queries

Token Optimization Techniques

1. Prompt Engineering

Use system messages efficiently (reused across requests)
Minimize examples; rely on model capabilities
Use structured outputs (JSON mode) to reduce token waste
Cache unchanging context at system level

2. Context Management

// ❌ Bad - includes unnecessary context
const prompt = `Here is the entire user history: ${JSON.stringify(history)}
Now analyze their last message: ${lastMessage}`;

// ✅ Good - only relevant context
const prompt = `Last 3 interactions: ${recentHistory}
Current message: ${lastMessage}`;

3. Streaming & Early Termination

// Stream responses to detect completion early
const stream = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages,
  stream: true,
  max_tokens: 500, // Hard limit prevents runaway costs
});

// Stop early if sufficient answer detected
for await (const chunk of stream) {
  if (isAnswerComplete(accumulated)) break;
}

Secure Secret Management

CRITICAL: Never Commit Secrets

All API keys and secrets MUST be:

Loaded from environment variables
Validated at startup with zod/joi
Redacted in all logs
Injected via platform-specific secret managers

Standard Pattern: Environment Configuration

Use this exact pattern for all services:

1. Environment Schema (src/config/env.ts)

import { z } from "zod";

const EnvSchema = z.object({
  NODE_ENV: z.string().default("development"),
  
  // OpenAI
  OPENAI_API_KEY: z.string().min(20, "OPENAI_API_KEY required"),
  OPENAI_MODEL: z.string().default("gpt-4o-mini"),
  OPENAI_TIMEOUT_MS: z.coerce.number().default(45_000),
  OPENAI_MAX_RETRIES: z.coerce.number().default(3),
  
  // Anthropic (if using Claude)
  ANTHROPIC_API_KEY: z.string().optional(),
  
  // Cost controls
  DAILY_TOKEN_BUDGET: z.coerce.number().default(1_000_000),
  ALERT_THRESHOLD_PCT: z.coerce.number().default(80),
});

export type AppEnv = z.infer<typeof EnvSchema>;

export const env: AppEnv = (() => {
  const parsed = EnvSchema.safeParse(process.env);
  if (!parsed.success) {
    const issues = parsed.error.issues
      .map(i => `${i.path.join(".")}: ${i.message}`)
      .join("\n");
    throw new Error(`❌ Invalid environment:\n${issues}`);
  }
  return parsed.data;
})();

2. Secret Redaction (src/lib/redact.ts)

const SECRET_PATTERNS = [
  /sk-[a-zA-Z0-9]{20,}/g,           // OpenAI keys
  /sk-ant-[a-zA-Z0-9-]{20,}/g,      // Anthropic keys
  /Bearer [a-zA-Z0-9._-]+/g,        // Bearer tokens
];

export function redactSecrets(input: unknown): unknown {
  if (typeof input === "string") {
    let redacted = input;
    SECRET_PATTERNS.forEach(pattern => {
      redacted = redacted.replace(pattern, "[REDACTED]");
    });
    return redacted;
  }
  
  if (typeof input === "object" && input !== null) {
    const json = JSON.stringify(input);
    const redacted = redactSecrets(json);
    return JSON.parse(redacted as string);
  }
  
  return input;
}

// Wrap all loggers
export const logger = {
  info: (msg: unknown) => console.log(redactSecrets(msg)),
  error: (msg: unknown) => console.error(redactSecrets(msg)),
  warn: (msg: unknown) => console.warn(redactSecrets(msg)),
};

3. Centralized Client (src/lib/openai.ts)

import OpenAI from "openai";
import { env } from "../config/env";

export const openai = new OpenAI({
  apiKey: env.OPENAI_API_KEY,
  timeout: env.OPENAI_TIMEOUT_MS,
  maxRetries: env.OPENAI_MAX_RETRIES,
});

// Wrapper with retry logic and error handling
export async function withRetries<T>(fn: () => Promise<T>): Promise<T> {
  const maxAttempts = env.OPENAI_MAX_RETRIES;
  let lastError: unknown;
  
  for (let attempt = 0; attempt <= maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (err: any) {
      lastError = err;
      const status = err?.status;
      
      // Don't retry client errors (except rate limits)
      if (status && status < 500 && status !== 429) break;
      
      if (attempt < maxAttempts) {
        const backoff = Math.min(2000 * Math.pow(2, attempt), 15_000);
        await new Promise(r => setTimeout(r, backoff));
      }
    }
  }
  
  // Never log the raw error (might contain keys)
  const status = (lastError as any)?.status;
  throw new Error(`OpenAI request failed (status ${status ?? "unknown"})`);
}

Deployment Secret Injection

Local Development (.env)

# .env (NEVER COMMIT - add to .gitignore)
OPENAI_API_KEY="sk-..."
OPENAI_MODEL="gpt-4o-mini"

Docker Compose (inject from shell)

services:
  agent-service:
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - OPENAI_MODEL=${OPENAI_MODEL}

Kubernetes (Secret + envFrom)

apiVersion: v1
kind: Secret
metadata:
  name: ai-secrets
type: Opaque
stringData:
  OPENAI_API_KEY: "${REPLACE_IN_CI}"
---
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: agent
        envFrom:
        - secretRef:
            name: ai-secrets

GitHub Actions

env:
  OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Cost Monitoring & Observability

Real-Time Budget Tracking

Implement token tracking middleware:

// src/lib/tokenTracker.ts
interface TokenUsage {
  promptTokens: number;
  completionTokens: number;
  totalTokens: number;
  estimatedCost: number;
  timestamp: Date;
}

class TokenBudgetTracker {
  private dailyUsage: TokenUsage[] = [];
  
  async track(usage: TokenUsage): Promise<void> {
    this.dailyUsage.push(usage);
    
    const dailyTotal = this.getDailyTotal();
    const budget = env.DAILY_TOKEN_BUDGET;
    const pctUsed = (dailyTotal / budget) * 100;
    
    if (pctUsed >= env.ALERT_THRESHOLD_PCT) {
      await this.alertBudgetThreshold(pctUsed, dailyTotal);
    }
    
    if (dailyTotal >= budget) {
      throw new Error(`Daily token budget exceeded: ${dailyTotal}/${budget}`);
    }
  }
  
  getDailyTotal(): number {
    const today = new Date().toDateString();
    return this.dailyUsage
      .filter(u => new Date(u.timestamp).toDateString() === today)
      .reduce((sum, u) => sum + u.totalTokens, 0);
  }
  
  private async alertBudgetThreshold(pct: number, used: number) {
    // Send alert via Slack/email/PagerDuty
    logger.warn({
      alert: "TOKEN_BUDGET_THRESHOLD",
      percentUsed: pct,
      tokensUsed: used,
      budget: env.DAILY_TOKEN_BUDGET,
    });
  }
}

export const tokenTracker = new TokenBudgetTracker();

Logging Best Practices

What to Log:

Request ID (for tracing)
Model used
Token counts (prompt/completion/total)
Latency (ms)
Cost estimate
Success/failure status

What NOT to Log:

User PII (names, emails, phone numbers)
API keys or secrets
Full conversation history
Raw error objects (may contain keys)

Example Structured Log:

logger.info({
  requestId: uuid(),
  agent: "quote-calculator",
  model: "gpt-4o-mini",
  promptTokens: 120,
  completionTokens: 45,
  totalTokens: 165,
  estimatedCost: 0.00002475, // $0.15 per 1M tokens
  latencyMs: 1240,
  success: true,
  timestamp: new Date().toISOString(),
});

Agent Development Guardrails

When to Use AI vs. Rules Engine

Use deterministic rules for:

Simple classification (zip code → service area)
Mathematical calculations (lot size → price)
Policy enforcement (business hours, service caps)

Use AI agents for:

Natural language understanding
Context-aware routing
Complex reasoning with ambiguity
Adaptive responses based on conversation flow

Agent Architecture Pattern

// Standard agent interface for consistency
interface Agent<TInput, TOutput> {
  name: string;
  model: string;
  estimatedTokensPerCall: number;
  maxRetries: number;
  
  execute(input: TInput): Promise<TOutput>;
  validate(output: unknown): TOutput; // Type guard + validation
}

// Example implementation
class QuoteCalculatorAgent implements Agent<QuoteInput, QuoteOutput> {
  name = "quote-calculator";
  model = env.OPENAI_MODEL;
  estimatedTokensPerCall = 500;
  maxRetries = 2;
  
  async execute(input: QuoteInput): Promise<QuoteOutput> {
    const start = Date.now();
    
    try {
      const response = await withRetries(() => 
        openai.chat.completions.create({
          model: this.model,
          messages: this.buildMessages(input),
          response_format: { type: "json_object" },
          max_tokens: 300, // Cost control
        })
      );
      
      const usage = response.usage!;
      await tokenTracker.track({
        promptTokens: usage.prompt_tokens,
        completionTokens: usage.completion_tokens,
        totalTokens: usage.total_tokens,
        estimatedCost: this.calculateCost(usage),
        timestamp: new Date(),
      });
      
      const output = JSON.parse(response.choices[0].message.content!);
      return this.validate(output);
      
    } catch (err) {
      logger.error({
        agent: this.name,
        error: "Agent execution failed",
        latencyMs: Date.now() - start,
      });
      throw err;
    }
  }
  
  validate(output: unknown): QuoteOutput {
    // Use zod for runtime validation
    return QuoteOutputSchema.parse(output);
  }
  
  private calculateCost(usage: any): number {
    // Model-specific pricing
    const pricing = {
      "gpt-4o-mini": { input: 0.15, output: 0.60 },
      "gpt-4o": { input: 2.50, output: 10.00 },
    };
    
    const rates = pricing[this.model as keyof typeof pricing];
    return (
      (usage.prompt_tokens * rates.input / 1_000_000) +
      (usage.completion_tokens * rates.output / 1_000_000)
    );
  }
  
  private buildMessages(input: QuoteInput): any[] {
    // Keep prompts minimal and focused
    return [
      { role: "system", content: "You are a lawn care quote calculator." },
      { role: "user", content: JSON.stringify(input) },
    ];
  }
}

Caching Strategy

Implement caching for repeated queries:

import { createHash } from "crypto";

class CachedAgent<TInput, TOutput> implements Agent<TInput, TOutput> {
  constructor(
    private innerAgent: Agent<TInput, TOutput>,
    private cache: Map<string, { output: TOutput, timestamp: Date }>,
    private ttlMs: number = 3600_000 // 1 hour default
  ) {}
  
  async execute(input: TInput): Promise<TOutput> {
    const cacheKey = this.getCacheKey(input);
    const cached = this.cache.get(cacheKey);
    
    if (cached && Date.now() - cached.timestamp.getTime() < this.ttlMs) {
      logger.info({ agent: this.innerAgent.name, cacheHit: true });
      return cached.output;
    }
    
    const output = await this.innerAgent.execute(input);
    this.cache.set(cacheKey, { output, timestamp: new Date() });
    
    return output;
  }
  
  private getCacheKey(input: TInput): string {
    return createHash("sha256")
      .update(JSON.stringify(input))
      .digest("hex");
  }
}

Pre-Deployment Checklist

Before deploying AI features:

Security

All secrets loaded from environment variables
No hardcoded API keys in code
Secret redaction applied to all loggers
.env added to .gitignore
.env.example committed (with placeholders)

Cost Control

Token budget calculated and configured
Cost monitoring/alerting implemented
Model selection justified (not over-provisioned)
Max token limits set on all completions
Caching implemented for repeated queries

Quality

Agent outputs validated with zod/joi schemas
Retry logic with exponential backoff
Timeout configured (prevent hung requests)
Fallback behavior defined for failures
Unit tests cover agent logic

Observability

Structured logging with request IDs
Token usage tracked per request
Latency metrics captured
Success/failure rates monitored
No PII in logs

Monitoring Dashboards

Track these metrics in your observability platform:

Cost Metrics:

Daily token usage (by agent, by model)
Daily cost estimate
Cost per customer interaction
Budget utilization percentage

Performance Metrics:

Agent latency (p50, p95, p99)
Success rate by agent
Retry rate
Cache hit rate

Quality Metrics:

Validation failure rate
Fallback invocation rate
User satisfaction scores
Agent output consistency

Anti-Patterns to Avoid

❌ Don't: Unbounded Context Windows

// Bad - includes entire conversation history
const messages = allMessages.map(m => ({ role: m.role, content: m.content }));

✅ Do: Sliding Window + Summarization

// Good - keep only recent context
const messages = [
  { role: "system", content: systemPrompt },
  { role: "assistant", content: conversationSummary },
  ...lastNMessages(5),
];

❌ Don't: Synchronous AI in Critical Path

// Bad - blocks user experience
const response = await aiAgent.execute(input);
return response;

✅ Do: Async Processing with Immediate Feedback

// Good - immediate response, AI processes async
await queue.enqueue({ userId, input });
return { status: "processing", estimatedTime: "30s" };

❌ Don't: Over-Engineering Simple Logic

// Bad - AI for deterministic calculation
const quote = await aiAgent.calculate({ sqft: 5000, frequency: "weekly" });

✅ Do: Rules for Deterministic, AI for Ambiguous

// Good - simple math, no AI needed
const basePrice = sqft * 0.02 * frequencyMultiplier;

Compound Value Framework

AI should amplify customer outcomes, not just replace humans:

Value Multipliers:

Personalization at scale: Tailored recommendations for each customer
Proactive engagement: Anticipate needs before they're expressed
Continuous learning: Improve based on interaction patterns
24/7 availability: Instant responses outside business hours
Consistency: Same quality experience for every customer

Measure Impact:

Customer satisfaction scores
Time-to-resolution
Conversion rates (quote → booking)
Retention rates
Cost per acquisition (with AI vs. without)

Summary

Follow this skill to ensure:

Cost-effective AI: Right model for the task, token budgets enforced
Secure by default: Secrets never committed, always redacted
Observable operations: Track costs, performance, and quality
Customer value focus: AI amplifies outcomes, not just automation

When in doubt, optimize for customer value per dollar spent, not just feature completeness.