buildkite-pipelines - SKILL.md Agent Skill

name: buildkite-pipelines description: Buildkite CI pipelines — pipeline YAML, steps, agents, artifacts, test splitting, dynamic pipelines domain: devops

Overview

Buildkite is a CI/CD platform where pipelines are defined in YAML and executed by self-hosted agents. Known for speed, flexibility, and hybrid cloud/agent architecture.

Capabilities

YAML pipeline configuration
Self-hosted agents (any OS, any cloud)
Test splitting and parallelism
Artifact upload/download between steps
Dynamic pipeline generation
Block steps for manual approval

When to Use

Need fast CI with self-hosted runners
Want test splitting for parallel execution
Need hybrid cloud CI (agents anywhere)
Complex pipelines with manual gates

When NOT to Use

Task is outside your authorization scope
You need to implement controls (use implementing-* skills)
Task is about analysis, not action (use analyzing-* skills)
You don't have access to target systems
Task requires compliance expertise (consult professionals)
Task is about defense, not offense (use defensive skills)

Pseudo Code

The buildkite-pipelines workflow follows a standard pipeline pattern.

Core flow:

# buildkite-pipelines primary flow
input = prepare(raw_data)
result = process(input, config={agents, artifacts, buildkite, dynamic, pipeline})
validate(result)
deliver(result)

Error handling:

on error:
  log(error_details)
  retry_with_backoff(max=3)
  if still_failing: alert_and_escalate()

Pipeline Configuration

# .buildkite/pipeline.yml
steps:
  - label: ":hammer: Test"
    command: npm ci && npm test
    agents:
      queue: default

  - wait

  - label: ":rocket: Deploy"
    command: ./deploy.sh
    branches: main
    agents:
      queue: production

Test Splitting

steps:
  - label: ":jest: Tests %n"
    command:
      - npm ci
      - TEST_FILES=$(buildkite-agent artifact search "tests/**" | split -n r/$BUILDKITE_PARALLEL_JOB)
      - jest $TEST_FILES
    parallelism: 4
    agents:
      queue: default

Dynamic Pipeline

#!/bin/bash
# generate-pipeline.sh
echo "steps:"
for service in $(ls services/); do
  echo "  - label: ':docker: Build $service'"
  echo "    command: docker build services/$service"
done | buildkite-agent pipeline upload

Manual Approval Gate

steps:
  - label: ":rocket: Deploy to Production"
    block: "Approve Production Deploy"
    fields:
      - select: "Environment"
        key: "env"
        options:
          - label: "Production"
            value: "prod"
          - label: "Staging"
            value: "staging"

  - label: ":ship: Deploy"
    command: ./deploy.sh
    depends_on: "Deploy to Production"

Common Patterns

Agents queue: agents: { queue: "deploy" } routes to specific agent pools
Artifacts: buildkite-agent artifact upload build.zip / download build.zip
Hooks: Agent-level hooks in ~/.buildkite-agent/hooks/
Plugins: plugins: [docker#v5.0.0: { image: "node:20" }]
Notify: Slack/email notifications on failure via notify block

How to Use

Define infrastructure as code (Terraform, CloudFormation, Pulumi)
Review changes through PR process before applying
Configure monitoring and alerting for critical paths
Set up secrets management (Vault, AWS Secrets Manager, etc.)
Document runbooks for deployment, rollback, and incident response
Test disaster recovery procedures regularly

Red Flags

Infrastructure changes without review: Unreviewed changes cause outages — use PRs for infra code
No rollback strategy: Every deployment needs a tested rollback plan before it runs
Secrets in configuration files: Secrets in YAML/JSON get committed to version control
Missing monitoring and alerting: Without monitoring, outages go undetected until users report them
No documentation for runbooks: Without runbooks, on-call engineers waste time re-discovering procedures