name: provisioning-cluster-for-production description: Guides initial CockroachDB cluster provisioning and production deployment. Self-Hosted covers cockroach start/init, Kubernetes deployment (Operator, Helm), hardware sizing, and production configuration. Advanced/BYOC covers Cloud Console, API, and Terraform provisioning with production settings. Standard covers cluster creation and provisioned compute selection. Basic covers cluster creation and spending limits. Use when creating a new cluster, preparing for production go-live, or validating deployment configuration. compatibility: Self-Hosted requires CLI access and infrastructure provisioning. Advanced/BYOC requires Cloud Console with Org Admin role. Standard requires Cloud Console. Basic requires Cloud Console. metadata: author: cockroachdb version: "1.0"
Provisioning Cluster for Production
Guides CockroachDB cluster creation and production deployment configuration. Before providing procedures, this skill gathers context to deliver tier-appropriate provisioning steps and production hardening guidance.
When to Use This Skill
- Creating a new CockroachDB cluster
- Preparing a development/staging cluster for production go-live
- Validating hardware and configuration for production readiness
- Choosing the right deployment tier and sizing
For post-deployment health checks: Use reviewing-cluster-health. For ongoing settings management: Use managing-cluster-settings. For capacity changes after deployment: Use managing-cluster-capacity.
Step 1: Gather Context
Required Context
| Question | Options | Why It Matters |
|---|---|---|
| Deployment tier? | Self-Hosted, Advanced, BYOC, Standard, Basic | Completely different provisioning procedures |
| Environment? | Production, Staging, Development | Determines hardware sizing and configuration rigor |
Additional Context (by tier)
If Self-Hosted:
| Question | Options | Why It Matters |
|---|---|---|
| Platform? | Bare metal, VMs (AWS/GCP/Azure), Kubernetes | Changes installation and start commands |
| If Kubernetes? | Operator (recommended), Helm, Manual StatefulSet | Determines deployment method |
| Node count? | 3 (minimum), 5, 9+ | Affects topology and replication |
| Multi-region? | Yes (how many regions), No | Requires locality flags and topology planning |
| Expected workload? | OLTP, mixed OLTP/analytics, write-heavy | Affects hardware sizing |
| Security requirements? | TLS required, encryption at rest, CMEK | Determines certificate and encryption setup |
If Advanced or BYOC:
| Question | Options | Why It Matters |
|---|---|---|
| Provisioning method? | Cloud Console, Cloud API, Terraform | Determines procedure |
| Cloud provider? | AWS, GCP, Azure | Affects region selection and networking |
| Node count and size? | e.g., 3 nodes x 8 vCPUs | Determines initial capacity |
If Standard: Gather expected workload size (vCPUs) and storage estimate.
If Basic: Gather expected usage pattern and monthly budget.
Context-Driven Routing
| Tier | Go To |
|---|---|
| Self-Hosted | Self-Hosted Provisioning |
| Advanced | Advanced Provisioning |
| BYOC | BYOC Provisioning |
| Standard | Standard Provisioning |
| Basic | Basic Provisioning |
Self-Hosted Provisioning
Applies when: Tier = Self-Hosted
Hardware Sizing
| Component | Minimum | Production Recommended |
|---|---|---|
| Nodes | 3 | 3+ (odd number per failure domain) |
| CPU | 4 vCPUs (non-burstable) | 8+ vCPUs |
| RAM | 16 GB | 32+ GB |
| Storage | 150 GB SSD | 500+ GB NVMe SSD |
| Network | 1 Gbps | 10 Gbps |
Memory formula: --cache + --max-sql-memory <= 75% of total RAM
Recommended: --cache=.25 --max-sql-memory=.25
Never use: burstable instances, HDDs, network-attached HDD, shared CPU.
See hardware-and-infrastructure reference for cloud instance recommendations.
Deploy on VMs / Bare Metal
Step 1: Install CockroachDB on each node
curl https://binaries.cockroachdb.com/cockroach-v<version>.linux-amd64.tgz | tar -xz
cp cockroach-v<version>.linux-amd64/cockroach /usr/local/bin/
Step 2: Generate certificates
cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key
cockroach cert create-node <node-hostname> <node-ip> localhost 127.0.0.1 \
--certs-dir=certs --ca-key=my-safe-directory/ca.key
cockroach cert create-client root --certs-dir=certs --ca-key=my-safe-directory/ca.key
Step 3: Start nodes (repeat on each node)
cockroach start \
--certs-dir=certs \
--store=path=<store-path> \
--listen-addr=<node-address>:26257 \
--http-addr=<node-address>:8080 \
--join=<node1-address>,<node2-address>,<node3-address> \
--locality=region=<region>,zone=<zone> \
--cache=.25 \
--max-sql-memory=.25 \
--background
Step 4: Initialize cluster (once, from any node)
cockroach init --certs-dir=certs --host=<any-node-address>
Step 5: Verify
cockroach node status --certs-dir=certs --host=<any-node-address>
Every node started in step 3 should appear with is_live = true and the expected locality.
Deploy on Kubernetes
Operator (recommended):
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/install/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/install/operator.yaml
# Apply CrdbCluster manifest with node count, resources, and storage
Helm:
helm repo add cockroachdb https://charts.cockroachdb.com/
helm install cockroachdb cockroachdb/cockroachdb \
--set statefulset.replicas=3 \
--set storage.persistentVolume.size=100Gi
Production Configuration (Self-Hosted)
After cluster is running, apply production settings:
-- Enable critical features
SET CLUSTER SETTING kv.rangefeed.enabled = true;
SET CLUSTER SETTING sql.stats.automatic_collection.enabled = true;
SET CLUSTER SETTING admission.kv.enabled = true;
-- Set timeouts
SET CLUSTER SETTING sql.defaults.idle_in_transaction_session_timeout = '300s';
SET CLUSTER SETTING sql.defaults.statement_timeout = '30s';
-- Install enterprise license (if applicable)
SET CLUSTER SETTING cluster.organization = '<org-name>';
SET CLUSTER SETTING enterprise.license = '<license-key>';
Create ballast files on each node:
cockroach debug ballast <store-path>/auxiliary/EMERGENCY_BALLAST --size=1GiB
Configure load balancer: Point to all nodes with health check on /health?ready=1.
See production-deployment-checklist reference for the full go-live checklist.
Advanced Provisioning
Applies when: Tier = Advanced
Via Cloud Console
- cockroachlabs.cloud → Create Cluster
- Select Advanced plan
- Choose cloud provider (AWS, GCP, Azure)
- Select region(s)
- Configure node count (minimum 3) and machine size (vCPUs per node)
- Configure storage
- Review and create
Via Cloud API
curl -X POST -H "Authorization: Bearer $COCKROACH_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "<cluster-name>",
"provider": "AWS",
"spec": {
"dedicated": {
"region_nodes": {"us-east-1": 3},
"machine_type": "m6i.xlarge",
"storage_gib": 150
}
}
}' \
"https://cockroachlabs.cloud/api/v1/clusters"
Via Terraform
resource "cockroach_cluster" "production" {
name = "production"
cloud_provider = "AWS"
dedicated {
num_virtual_cpus = 8
storage_gib = 150
num_nodes = 3
}
regions = [{
name = "us-east-1"
}]
}
Post-Provisioning
- Configure IP allowlists or VPC Peering/PrivateLink
- Create SQL users and databases
- Set maintenance window (see performing-cluster-maintenance)
- Configure metrics export to Datadog/Prometheus if needed
BYOC Provisioning
Applies when: Tier = BYOC
Follow Advanced Provisioning steps — BYOC uses the same Cloud Console, API, and Terraform interfaces.
Additional BYOC steps:
- Ensure your cloud account meets CRL prerequisites (service account, VPC, IAM roles)
- Configure PrivateLink/PSC for private connectivity
- Verify CRL service account permissions
Standard Provisioning
Applies when: Tier = Standard
- cockroachlabs.cloud → Create Cluster
- Select Standard plan
- Choose cloud provider and region
- Set provisioned compute (vCPUs) based on expected workload
- Create
Post-provisioning:
- Create SQL users and databases
- Configure IP allowlists
- Set session-level defaults:
ALTER ROLE ALL SET statement_timeout = '30s'; ALTER ROLE ALL SET idle_in_transaction_session_timeout = '300s';
Basic Provisioning
Applies when: Tier = Basic
- cockroachlabs.cloud → Create Cluster
- Select Basic plan
- Choose cloud provider and region
- Create (auto-scales, no sizing needed)
Post-provisioning:
- Set spending limits (Cloud Console → Cluster → Settings)
- Create SQL users and databases
- Configure IP allowlists
Safety Considerations
| Operation | Tier | Risk |
|---|---|---|
cockroach init |
SH | Safe — only runs once; subsequent calls are no-ops |
| Certificate generation | SH | Store CA key securely — loss means no new certs |
| Cloud cluster creation | ADV/BYOC/STD/BAS | Safe — can be deleted if misconfigured |
| Production settings changes | SH | See managing-cluster-settings |
Critical (Self-Hosted):
- Never use
--insecurein production — always use TLS - Never use burstable instances for production workloads
- Always set
--localityflags for multi-node clusters - Always configure
--cacheand--max-sql-memory(defaults are too low) - Always create ballast files before going to production
Troubleshooting
| Issue | Tier | Fix |
|---|---|---|
cockroach init fails |
SH | Check all nodes are started and reachable on port 26257 |
| Node won't join cluster | SH | Verify --join addresses; check firewall rules for ports 26257, 8080 |
| "clock offset" error | SH | Sync clocks with NTP; check --max-offset setting |
| TLS handshake failure | SH | Verify certs match; check CA is the same across all nodes |
| Cloud cluster stuck in "Creating" | ADV/BYOC | Wait 15 min; contact support if no progress |
| Cannot connect after creation | ALL | Check IP allowlist; verify connection string; try with root user |
References
Skill references:
Related skills:
- reviewing-cluster-health — Post-deployment health check
- managing-cluster-settings — Production settings
- managing-certificates-and-encryption — TLS setup
- managing-cluster-capacity — Scaling after deployment
Official CockroachDB Documentation: