oxy-task-spec-default

name: oxy-task-spec-default description: Use when adding or modifying code in `crates/app/src/server/` or HTTP handlers that involves long-running compute, periodic schedules, multi-step pipelines, or any work that must survive instance death. Triggers include "spawn a background task", "schedule periodically", "run async after returning", "long-running operation", "fire and forget", "tokio::spawn", "background job", "queue this", or PRs that add new work to HTTP handlers. Also triggers when designing features that touch LLM APIs, git clones, embedding builds, or any operation taking >5 seconds.

Default long-running work to TaskSpec on the worker fleet

Per Oxy's scaling design (internal-docs/2026-05-31-scaling-oxy-multi-instance-architecture.md, refinement H):

If a feature has any of {long-running compute, periodic schedule, durability requirement, multi-step pipeline}, it should be a TaskSpec on the worker fleet, not a tokio::spawn in an HTTP handler.

What this means concretely

New background work defaults to enqueueing a TaskSpec in agentic_task_queue via the existing durable orchestrator in crates/agentic/runtime/src/orchestrator/.
New tokio::spawn inside HTTP handlers requires written justification (sub-second fire-and-forget only).
The worker fleet (oxy worker) drains the queue; the HTTP fleet (oxy serve) only enqueues.

When to default to TaskSpec — quick checklist

Does it take more than ~5 seconds? → TaskSpec.
Does it touch the network, LLM APIs, or external services? → TaskSpec.
Must it survive oxy serve restart? → TaskSpec.
Will the client poll for status? → TaskSpec.
Does it run on a schedule? → TaskSpec.
Could it be retried? → TaskSpec.

If any of these is yes, default to TaskSpec.

How to add a new TaskSpec variant

Define the variant in the TaskSpec enum (find it under crates/agentic/runtime/src/). Carry only what the executor needs; reference indirect data (e.g., auth_token_ref not the raw token).
Implement the executor branch — invoke the actual work, emit lifecycle events (Started, Progress(_), Done / Failed).
Wire the enqueue site — replace the tokio::spawn with transport.enqueue(TaskSpec::YourVariant{...}).
Status tracking — domain-specific status columns are updated by the executor as the single writer. Never spawn AND update from the same call site.
Idempotency — if the same logical work is enqueued twice (e.g., same repo URL + target path), dedupe on a stable key before INSERT.
Retry policy — set a TaskPolicy with backoff. Don't retry terminal errors (auth expired, permission denied); do retry transient ones (network blip).
Tests — unit-test the executor happy path + at least one failure mode + idempotency.

Counter-examples — when `tokio::spawn` is acceptable

Sub-second fire-and-forget that genuinely doesn't need durability (e.g., emit a metric).
Per-request streaming aggregation that ends when the SSE connection closes.
Background work tied to a single SSE stream's lifecycle and reset on reconnect.

In all other cases, prefer TaskSpec.

Where to look for existing examples

Workflow execution: TaskSpec::Workflow — the most mature executor.
Airway ELT: TaskSpec::Airway — heavy IO + retries + progress events.
Builder analytics: TaskSpec::Agent — LLM-driven multi-step pipeline.
Preagg cycles: TaskSpec::Preagg — periodic refresh.

Refs

Design doc: internal-docs/2026-05-31-scaling-oxy-multi-instance-architecture.md (refinement H)
Scope survey of current violations: internal-docs/2026-05-28-worker-fleet-scope-survey.md
Worker fleet guide: internal-docs/worker-fleet.md
Backend architecture rules: internal-docs/backend-architecture.md

Default long-running work to TaskSpec on the worker fleet

What this means concretely

When to default to TaskSpec — quick checklist

How to add a new TaskSpec variant

Counter-examples — when tokio::spawn is acceptable

Where to look for existing examples

Refs

Counter-examples — when `tokio::spawn` is acceptable