cert-manager-troubleshooting

star 12

Use when TLS certificates are not being issued or renewed, when Certificate or CertificateRequest resources show errors, when ACME challenges fail, or when Issuer or ClusterIssuer resources are not ready

foxj77 By foxj77 schedule Updated 2/9/2026

name: cert-manager-troubleshooting description: Use when TLS certificates are not being issued or renewed, when Certificate or CertificateRequest resources show errors, when ACME challenges fail, or when Issuer or ClusterIssuer resources are not ready

cert-manager Troubleshooting

Diagnose failures in cert-manager — the controller that automates TLS certificate issuance and renewal from sources like Let's Encrypt (ACME), HashiCorp Vault, Venafi, and self-signed CAs.

Keywords

cert-manager, certificate, tls, ssl, https, letsencrypt, acme, issuer, clusterissuer, certificaterequest, order, challenge, http01, dns01, self-signed, ca, renew, renewal, expired, not-ready, ingress, annotation

When to Use This Skill

  • Certificate resources show Ready: False or Issuing for too long
  • CertificateRequest is stuck or shows errors
  • ACME challenges (HTTP-01 or DNS-01) are failing
  • Issuer or ClusterIssuer shows Ready: False
  • TLS secrets are not being created or contain expired certificates
  • Ingress TLS annotations are not triggering certificate creation
  • Certificate renewal is not happening before expiry

When NOT to Use

Related Skills

Quick Reference

Task Command
Check cert-manager pods kubectl get pods -n cert-manager
List Certificates kubectl get certificate -A
List Issuers kubectl get issuer -A && kubectl get clusterissuer
Check CertificateRequests kubectl get certificaterequest -A
Check Orders (ACME) kubectl get order -A
Check Challenges (ACME) kubectl get challenge -A
View cert-manager logs kubectl logs -n cert-manager deploy/cert-manager --tail=200
Check cert expiry kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates

Diagnostic Workflow

TLS certificate not working?
├─ Certificate resource exists?
│   ├─ No → Check Ingress annotations or create Certificate (Section 4)
│   └─ Yes → Check Certificate status
│       ├─ Ready: True → Certificate issued, check if correct (Section 6)
│       └─ Ready: False → Check CertificateRequest
│           ├─ No CertificateRequest → Issuer reference broken (Section 2)
│           ├─ CertificateRequest pending → Check Issuer health (Section 2)
│           └─ CertificateRequest denied → Check approval/RBAC
├─ Using ACME (Let's Encrypt)?
│   ├─ Order created? → Check Order status
│   │   ├─ Order pending → Check Challenge status (Section 3)
│   │   ├─ Order invalid → All challenges failed
│   │   └─ Order errored → ACME account issue (Section 2)
│   └─ Challenge status?
│       ├─ HTTP-01 pending → Solver pod/ingress issues (Section 3)
│       └─ DNS-01 pending → DNS provider issues (Section 3)
└─ Certificate expired?
    └─ Renewal not triggered → Check renewal config (Section 5)

Section 1: Controller Health

# Check all cert-manager components
kubectl get pods -n cert-manager -o wide

# Controller logs (main component)
kubectl logs -n cert-manager deploy/cert-manager --tail=200

# Webhook logs (validates/converts resources)
kubectl logs -n cert-manager deploy/cert-manager-webhook --tail=100

# CA injector logs (injects CA bundles)
kubectl logs -n cert-manager deploy/cert-manager-cainjector --tail=100

# Check CRDs
kubectl get crd | grep cert-manager

# Check webhook connectivity
kubectl get validatingwebhookconfigurations | grep cert-manager
kubectl get mutatingwebhookconfigurations | grep cert-manager

Controller Issues

Symptom Cause Diagnostic
Certificate stuck at "Issuing" Controller not processing Check controller logs for errors
Any cert-manager resource creation fails Webhook unreachable Check webhook pod, Service, and network policies
"conversion webhook" errors CRD version mismatch Check if cert-manager version matches CRD version
Leader election errors Multiple controllers competing Check for duplicate installations

Section 2: Issuer and ClusterIssuer

# Check Issuer status
kubectl get issuer -n ${NS} -o wide
kubectl describe issuer ${ISSUER_NAME} -n ${NS}

# Check ClusterIssuer status
kubectl get clusterissuer -o wide
kubectl describe clusterissuer ${ISSUER_NAME}

# Check the Certificate's issuer reference
kubectl get certificate ${CERT_NAME} -n ${NS} -o jsonpath='{.spec.issuerRef}'

ACME Issuer (Let's Encrypt)

# Check ACME account registration
kubectl describe issuer ${ISSUER_NAME} -n ${NS} | grep -A10 "Status:"

# Check ACME account secret
kubectl get secret ${ISSUER_NAME}-account-key -n ${NS} 2>/dev/null

# Verify ACME server URL
kubectl get issuer ${ISSUER_NAME} -n ${NS} -o jsonpath='{.spec.acme.server}'
ACME Server URL Use Case
Let's Encrypt Production https://acme-v02.api.letsencrypt.org/directory Production certs
Let's Encrypt Staging https://acme-staging-v02.api.letsencrypt.org/directory Testing (fake certs, higher rate limits)
Error Cause Diagnostic
"account not registered" ACME registration failed Check email config and account secret status
"invalid account" Account key mismatch Check if account key secret is corrupted or stale
Issuer Ready: False Can't reach ACME server Check network/proxy connectivity and verify server URL

CA Issuer

# Check CA secret exists and has required keys
kubectl get secret ${CA_SECRET} -n ${NS}
kubectl get secret ${CA_SECRET} -n ${NS} -o jsonpath='{.data}' | jq 'keys'
# Must contain: tls.crt, tls.key

Vault Issuer

# Check Vault connection
kubectl describe issuer ${ISSUER_NAME} -n ${NS} | grep -A20 "Spec:"

# Logs for Vault errors
kubectl logs -n cert-manager deploy/cert-manager --tail=200 | grep -iE 'vault|pki|sign|auth'

Section 3: ACME Challenges (HTTP-01 and DNS-01)

# List all challenges
kubectl get challenge -A -o wide

# Describe failing challenge
kubectl describe challenge ${CHALLENGE_NAME} -n ${NS}

# Check challenge state
kubectl get challenge -A -o json | jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name)\t\(.spec.type)\t\(.status.state)\t\(.status.reason // "no reason")"'

HTTP-01 Challenge Troubleshooting

# Check solver pod
kubectl get pods -n ${NS} -l acme.cert-manager.io/http01-solver=true

# Check solver Ingress/Service
kubectl get ingress -n ${NS} -l acme.cert-manager.io/http01-solver=true
kubectl get svc -n ${NS} -l acme.cert-manager.io/http01-solver=true

# Test challenge URL locally
# The URL is: http://<domain>/.well-known/acme-challenge/<token>
kubectl get challenge ${CHALLENGE_NAME} -n ${NS} -o jsonpath='{.spec.token}'
HTTP-01 Problem Cause Diagnostic
Solver pod not created RBAC or class annotation issue Check cert-manager logs and verify ingress class
Solver pod pending Resource constraints Check node capacity and resource quotas
Challenge URL returns 404 Ingress not routing to solver Check ingress class, annotations, and controller
Challenge URL unreachable Firewall blocks port 80 Verify port 80 is open from internet to ingress controller
"wrong status code: 503" Load balancer health check failing Check LB health check config and solver pod readiness
Challenge stays pending Let's Encrypt can't reach cluster Verify domain resolves to cluster ingress IP

DNS-01 Challenge Troubleshooting

# Check which DNS provider is configured
kubectl get issuer ${ISSUER_NAME} -n ${NS} -o json | jq '.spec.acme.solvers[].dns01'

# Check for DNS provider credentials
kubectl get challenge ${CHALLENGE_NAME} -n ${NS} -o json | jq '.spec.solver.dns01'

# Verify TXT record was created
dig TXT _acme-challenge.${DOMAIN} @8.8.8.8

# Check cert-manager DNS provider logs
kubectl logs -n cert-manager deploy/cert-manager --tail=300 | grep -iE 'dns|challenge|present|cleanup|txt'
DNS-01 Problem Cause Diagnostic
TXT record not created Provider auth failed Check DNS provider credentials in Issuer
TXT record exists but challenge fails DNS propagation delay Check nameserver configuration and --dns01-recursive-nameservers-only setting
Wrong zone DNS split-horizon or sub-domain issue Check --dns01-recursive-nameservers setting
"zone not found" Provider can't find the DNS zone Verify zone exists and credentials have access
Cleanup failed (TXT not removed) Provider API error on delete Check for stale _acme-challenge TXT records at provider

Common DNS-01 Providers

Provider Secret Type Key Fields
Route53 AWS credentials or IRSA accessKeyID, secretAccessKeySecretRef, or IRSA annotation
Azure DNS Service principal or Workload Identity clientID, clientSecretSecretRef, subscriptionID, resourceGroupName
CloudFlare API token apiTokenSecretRef
Google Cloud DNS Service account key serviceAccountSecretRef, project

Section 4: Certificate and Ingress Configuration

# Check Certificate spec
kubectl get certificate ${CERT_NAME} -n ${NS} -o yaml

# Check CertificateRequest chain
kubectl get certificaterequest -n ${NS} -o wide
kubectl describe certificaterequest ${CR_NAME} -n ${NS}

# Check if Ingress annotations trigger cert creation
kubectl get ingress -n ${NS} -o json | jq '.items[] | {name: .metadata.name, annotations: .metadata.annotations, tls: .spec.tls}'

Ingress Annotation Issues

Annotation Purpose Common Mistake
cert-manager.io/issuer Namespace Issuer Using ClusterIssuer name with this annotation
cert-manager.io/cluster-issuer ClusterIssuer Typo in name or using Issuer name
cert-manager.io/issue-temporary-certificate Temp cert during issuance Forgetting this causes TLS errors during initial issuance
cert-manager.io/common-name Override CN Ignored if dnsNames are set

Certificate Spec Problems

Problem Cause Diagnostic
"issuer not found" Wrong issuerRef.name or issuerRef.kind Verify issuer exists and kind matches (Issuer vs ClusterIssuer)
"dnsNames required" No domains specified Check if dnsNames is present in Certificate spec
"secret already exists" Conflict with manually created Secret Check if a Secret with the same name already exists
CertificateRequest not created Spec validation failed Check cert-manager webhook logs

Section 5: Renewal and Expiry

# Check certificate dates
kubectl get certificate -A -o json | jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name)\t\(.status.notAfter)\t\(.status.renewalTime)\tReady=\(.status.conditions[0].status)"'

# Check from the actual TLS secret
kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates -subject

# To force renewal, recommend to user:
# kubectl cert-manager renew ${CERT_NAME} -n ${NS}
Problem Cause Diagnostic
Not renewing before expiry renewBefore too short or controller stopped Check renewBefore setting (default: 2/3 of cert lifetime)
Renewed but old cert in Secret Secret not updated Check if Secret was manually modified; recommend deleting Secret so cert-manager recreates it
Let's Encrypt rate limited Too many issuances for same domain Check describe order for rate limit messages; recommend using staging for testing
Wildcard cert not renewing DNS-01 solver broken Check DNS-01 challenge flow specifically

Let's Encrypt Rate Limits

Limit Value Scope
Certificates per domain 50/week Registered domain (e.g., example.com)
Duplicate certificates 5/week Same set of domain names
Failed validations 5/hour Per account, per hostname
New registrations 10/3 hours Per IP

Section 6: Verifying Issued Certificates

# Check certificate details from the K8s Secret
kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -text | head -30

# Verify cert chain
kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl verify -show_chain

# Check cert matches key
CERT_MOD=$(kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -modulus | md5)
KEY_MOD=$(kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.key}' | base64 -d | openssl rsa -noout -modulus | md5)
echo "Match: $([ "$CERT_MOD" = "$KEY_MOD" ] && echo YES || echo NO)"

# Check if staging cert (not trusted in browsers)
kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -issuer
# Staging issuer contains "(STAGING)" or "Fake LE"
Problem Cause Diagnostic
Browser shows "Not Trusted" Using Let's Encrypt staging issuer Check issuer URL in Issuer/ClusterIssuer spec; verify if it points to staging (acme-staging-v02)
Wrong SANs on cert dnsNames in Certificate don't match Ingress hosts Compare Certificate dnsNames with Ingress host rules; recommend updating Certificate to match
Cert issued but Ingress uses old one Ingress controller cached old cert Check Secret tls.crt issue date vs Ingress annotation; recommend restarting ingress controller or waiting for cache TTL
"certificate signed by unknown authority" CA cert not in client trust store Verify issuer chain in the certificate; recommend adding CA cert to client trust store or using ca.crt from Secret

MCP Tools Available

When the appropriate MCP servers are connected, prefer these over raw kubectl where available:

  • mcp__flux-operator-mcp__get_kubernetes_resources - Query Certificates, CertificateRequests, Orders, Challenges, Issuers
  • mcp__flux-operator-mcp__get_kubernetes_logs - Retrieve cert-manager controller, webhook, and cainjector logs
  • mcp__flux-operator-mcp__get_kubernetes_metrics - Check cert-manager resource consumption

Common Mistakes

Mistake Why It Fails Instead
Testing with Let's Encrypt production Hits rate limits quickly during debugging Use acme-staging-v02.api.letsencrypt.org until everything works, then switch
Wildcard cert with HTTP-01 HTTP-01 doesn't support wildcards, only specific domains Use DNS-01 solver for wildcard certificates (*.example.com)
Forgetting port 80 for HTTP-01 Let's Encrypt must reach /.well-known/acme-challenge/ over HTTP Ensure port 80 is open from the internet to the ingress controller
Using cert-manager.io/issuer for a ClusterIssuer Ingress annotation looks for a namespace-scoped Issuer Use cert-manager.io/cluster-issuer annotation instead
Deleting and recreating Certificate for renewal Causes a new ACME order, consuming rate limits Recommend using kubectl cert-manager renew to trigger renewal of existing cert
Not checking Order and Challenge resources Debugging the Certificate when the problem is in the ACME challenge Follow the chain: Certificate → CertificateRequest → Order → Challenge

Behavioural Guidelines

  1. Follow the resource chain — Certificate → CertificateRequest → Order → Challenge. Diagnose at the deepest failing layer.
  2. Never display raw private key contents — Examining certificate metadata (dates, SANs, issuer) is safe. Modulus comparison (openssl rsa -noout -modulus) for cert/key matching is acceptable, but never print full tls.key data.
  3. Check the Issuer first — If the Issuer is Ready: False, no Certificate can be issued.
  4. Distinguish staging from production — Staging certs are not browser-trusted but have much higher rate limits for testing.
  5. Watch for rate limits — Let's Encrypt production has strict limits. Check kubectl describe order for rate limit messages before retrying.
  6. Verify DNS resolution — For both HTTP-01 and DNS-01, the domain must resolve correctly. Use dig to verify.
  7. Check all cert-manager components — The controller, webhook, and cainjector all play different roles. A webhook failure blocks resource creation entirely.
Install via CLI
npx skills add https://github.com/foxj77/claude-code-skills --skill cert-manager-troubleshooting
Repository Details
star Stars 12
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator