pbkv-agent-workflow

star 1

Prediction-based KV-Cache management for efficient serving of dynamic agent workflows. Predicts future agent invocations to optimize cache eviction and prefetching.

hiyenwong By hiyenwong schedule Updated 6/4/2026

name: pbkv-agent-workflow description: Prediction-based KV-Cache management for efficient serving of dynamic agent workflows. Predicts future agent invocations to optimize cache eviction and prefetching. category: deep-learning tags: [LLM, KV-cache, agent-workflow, serving, inference-optimization, prediction] trigger: pbkv, kv-cache management, agent workflow serving, dynamic workflow, cache prediction, KVFlow

PBKV: Prediction-Based KV-Cache Management for Agent Workflows

Overview

PBKV optimizes KV-Cache management for dynamic LLM agent workflows by predicting future agent invocations and using these predictions to guide cache eviction and prefetching decisions.

Core Technique

  1. Workflow Prediction Model: Fuses historical workflow patterns with current task context to predict agent invocation sequence for next several steps
  2. Reuse Potential Estimation: Based on predictions, estimates which cache entries will be reused and prioritizes keeping them in GPU memory
  3. Conservative Policy: Uses predictions conservatively during both cache eviction and prefetching to be robust to prediction errors
  4. Dynamic Adaptation: Handles workflows where agent sequence depends on task context (unlike static workflow assumptions in prior work)

Key Benefits

  • 1.85x speedup over LRU on dynamic workflows
  • 1.26x speedup over KVFlow (SOTA) even on static workflows
  • Robust to prediction errors via conservative cache management

Implementation Steps

  1. Collect historical workflow execution traces (agent sequences, contexts)
  2. Train lightweight prediction model: input = (current context + history), output = next N agent invocations
  3. At each step, predict future agents and estimate cache reuse potential
  4. Evict low-potential entries first; prefetch high-potential entries conservatively
  5. Fall back to safe eviction when prediction confidence is low

Pitfalls

  • Prediction model must be lightweight — heavy models negate cache management savings
  • Conservative policy may leave suboptimal entries in cache — tune confidence threshold
  • Historical data distribution shift degrades prediction accuracy over time

Activation Keywords

pbkv, kv-cache management, agent workflow serving, dynamic workflow, cache prediction, KVFlow, LRU replacement, inference optimization

Install via CLI
npx skills add https://github.com/hiyenwong/ai_collection --skill pbkv-agent-workflow
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator