agentdb-learning-plugins - SKILL.md Agent Skill

name: "AgentDB Learning Plugins" description: "Create and train RL learning plugins with AgentDB's plugin system."

AgentDB Learning Plugins

CLI Quick Start

# Interactive wizard
npx agentdb@latest create-plugin

# Use specific template
npx agentdb@latest create-plugin -t decision-transformer -n my-agent

# Preview without creating
npx agentdb@latest create-plugin -t q-learning --dry-run

# Custom output directory
npx agentdb@latest create-plugin -t actor-critic -o ./plugins

# List available templates
npx agentdb@latest list-templates

# List installed plugins
npx agentdb@latest list-plugins

# Get plugin info
npx agentdb@latest plugin-info my-agent

API Quick Start

import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';

const adapter = await createAgentDBAdapter({
  dbPath: '.agentdb/learning.db',
  enableLearning: true,
  enableReasoning: true,
  cacheSize: 1000,
});

Algorithm Templates and Configs

1. Decision Transformer (Recommended)

Offline RL -- learns from logged experiences without online interaction.

npx agentdb@latest create-plugin -t decision-transformer -n dt-agent

{
  "algorithm": "decision-transformer",
  "model_size": "base",
  "context_length": 20,
  "embed_dim": 128,
  "n_heads": 8,
  "n_layers": 6
}

2. Q-Learning

Off-policy, value-based. Best for discrete action spaces.

npx agentdb@latest create-plugin -t q-learning -n q-agent

{
  "algorithm": "q-learning",
  "learning_rate": 0.001,
  "gamma": 0.99,
  "epsilon": 0.1,
  "epsilon_decay": 0.995
}

3. SARSA

On-policy, value-based. More conservative than Q-Learning -- better for safety-critical tasks.

npx agentdb@latest create-plugin -t sarsa -n sarsa-agent

{
  "algorithm": "sarsa",
  "learning_rate": 0.001,
  "gamma": 0.99,
  "epsilon": 0.1
}

4. Actor-Critic

Policy gradient with value baseline. Works for continuous and discrete action spaces.

npx agentdb@latest create-plugin -t actor-critic -n ac-agent

{
  "algorithm": "actor-critic",
  "actor_lr": 0.001,
  "critic_lr": 0.002,
  "gamma": 0.99,
  "entropy_coef": 0.01
}

5. Curiosity-Driven

npx agentdb@latest create-plugin -t curiosity-driven -n curious-agent

Templates 5-9

Also available via list-templates: active-learning, adversarial-training, curriculum-learning, federated-learning, multi-task-learning. These have no dedicated CLI template flag -- use the interactive wizard (create-plugin with no -t).

Training Workflow

Store Experiences

await adapter.insertPattern({
  id: '',
  type: 'experience',
  domain: 'task-domain',
  pattern_data: JSON.stringify({
    embedding: await computeEmbedding(JSON.stringify(step)),
    pattern: {
      state: step.state,
      action: step.action,
      reward: step.reward,
      next_state: step.next_state,
      done: step.done,
    },
  }),
  confidence: step.reward > 0 ? 0.9 : 0.5,
  usage_count: 1,
  success_count: step.reward > 0 ? 1 : 0,
  created_at: Date.now(),
  last_used: Date.now(),
});

Train

const metrics = await adapter.train({
  epochs: 100,
  batchSize: 64,
  learningRate: 0.001,
  validationSplit: 0.2,
});
// Returns: { loss, valLoss, duration, epochs }

Evaluate

const result = await adapter.retrieveWithReasoning(testQuery, {
  domain: 'task-domain',
  k: 10,
  synthesizeContext: true,
});
const suggestedAction = result.memories[0].pattern.action;
const confidence = result.memories[0].similarity;

Prioritized Experience Replay

// Store with TD error as priority
await adapter.insertPattern({
  // ... standard fields
  confidence: tdError,  // TD error = priority
});

// Retrieve only high-priority experiences
const highPriority = await adapter.retrieveWithReasoning(queryEmbedding, {
  domain: 'task-domain',
  k: 32,
  minConfidence: 0.7,
});

Multi-Agent Training

for (const agent of agents) {
  const experience = await agent.step();
  await adapter.insertPattern({
    domain: `multi-agent/${agent.id}`,
    // ... experience data
  });
}

await adapter.train({ epochs: 50, batchSize: 64 });

Combined Learning + Reasoning

await adapter.train({ epochs: 50, batchSize: 32 });

const result = await adapter.retrieveWithReasoning(queryEmbedding, {
  domain: 'decision-making',
  k: 10,
  useMMR: true,
  synthesizeContext: true,
  optimizeMemory: true,
});

Troubleshooting

Not converging: Lower learningRate (try 0.0001).

Overfitting: Add validationSplit: 0.2, enable optimizeMemory: true to consolidate patterns.

Slow training: Enable quantization (quantizationType: 'binary').