name: service-mesh-istio description: Deploy and configure Istio service mesh for Kubernetes. Covers mTLS between services, traffic management with VirtualServices/DestinationRules, observability with Kiali/Jaeger/Prometheus, circuit breaking, and zero-trust networking. version: 1.0.0 tags: [istio, service-mesh, kubernetes, mtls, traffic-management, observability, zero-trust]
Service Mesh with Istio
Overview
Istio is a service mesh that adds a sidecar proxy (Envoy) to every pod in Kubernetes, enabling mutual TLS (mTLS) between services, fine-grained traffic control, automatic distributed tracing, and policy enforcement — all without changing application code. It implements zero-trust networking (all service-to-service traffic is authenticated and encrypted by default) and provides circuit breaking, retries, and canary deployments at the infrastructure layer.
When to Use
- Enforcing mTLS for all service-to-service communication in Kubernetes
- Canary deployments or A/B testing at the infrastructure layer (no code changes needed)
- Distributed tracing across all microservices without instrumenting each service
- Circuit breaking and retry logic without implementing it in every service
- Enforcing network policies (which services can talk to which) via AuthorizationPolicy
Step-by-Step Workflow
1. Istio Installation
# Download istioctl
curl -L https://istio.io/downloadIstio | sh -
cd istio-1.21.0
export PATH=$PWD/bin:$PATH
# Install Istio with default profile (good for production)
istioctl install --set profile=default -y
# Or install with demo profile (includes all observability tools)
istioctl install --set profile=demo -y
# Enable automatic sidecar injection for namespace
kubectl label namespace production istio-injection=enabled
# Verify installation
istioctl verify-install
kubectl get pods -n istio-system
# Install observability addons
kubectl apply -f samples/addons/prometheus.yaml
kubectl apply -f samples/addons/grafana.yaml
kubectl apply -f samples/addons/jaeger.yaml
kubectl apply -f samples/addons/kiali.yaml
# Open dashboards
istioctl dashboard kiali # Service graph
istioctl dashboard grafana # Metrics
istioctl dashboard jaeger # Traces
2. mTLS and PeerAuthentication
# Enable strict mTLS for entire mesh (all services must use mTLS)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system # Mesh-wide policy
spec:
mtls:
mode: STRICT # STRICT = only mTLS allowed; PERMISSIVE = both
---
# Namespace-level exception (e.g., legacy service that can't do mTLS)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: allow-plaintext
namespace: legacy
spec:
mtls:
mode: PERMISSIVE
---
# AuthorizationPolicy — which services can call which
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: order-service-policy
namespace: production
spec:
selector:
matchLabels:
app: order-service
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/production/sa/api-gateway"
- "cluster.local/ns/production/sa/checkout-service"
to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/orders", "/api/orders/*"]
- from:
- source:
principals:
- "cluster.local/ns/monitoring/sa/prometheus"
to:
- operation:
paths: ["/metrics"]
3. Traffic Management — VirtualService and DestinationRule
# DestinationRule — defines subsets (versions) and load balancing
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: product-service
namespace: production
spec:
host: product-service
trafficPolicy:
loadBalancer:
simple: LEAST_CONN # or ROUND_ROBIN, RANDOM
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
http2MaxRequests: 200
outlierDetection: # Circuit breaker
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
---
# VirtualService — traffic routing rules
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: product-service
namespace: production
spec:
hosts:
- product-service
http:
# Header-based routing (for QA testing of v2)
- match:
- headers:
x-canary:
exact: "true"
route:
- destination:
host: product-service
subset: v2
# Canary: 10% traffic to v2, 90% to v1
- route:
- destination:
host: product-service
subset: v1
weight: 90
- destination:
host: product-service
subset: v2
weight: 10
timeout: 5s
retries:
attempts: 3
perTryTimeout: 2s
retryOn: "gateway-error,connect-failure,retriable-4xx"
4. Ingress Gateway
# Gateway — external traffic entry point
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: api-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: api-tls-secret # K8s secret with TLS cert
hosts:
- "api.example.com"
- port:
number: 80
name: http
protocol: HTTP
tls:
httpsRedirect: true
hosts:
- "api.example.com"
---
# VirtualService for external routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: external-routing
namespace: production
spec:
hosts:
- "api.example.com"
gateways:
- istio-system/api-gateway
http:
- match:
- uri:
prefix: "/api/v1/orders"
route:
- destination:
host: order-service
port:
number: 5000
- match:
- uri:
prefix: "/api/v1/products"
route:
- destination:
host: product-service
port:
number: 4000
5. Fault Injection (Chaos Testing)
# Inject latency for 50% of requests to product-service
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: product-service-chaos
namespace: production
spec:
hosts:
- product-service
http:
- fault:
delay:
percentage:
value: 50
fixedDelay: 3s
abort:
percentage:
value: 5
httpStatus: 500 # Inject 500 errors for 5% of requests
route:
- destination:
host: product-service
subset: v1
6. Observability Configuration
# Telemetry — customize tracing sampling rate
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: mesh-default
namespace: istio-system
spec:
tracing:
- providers:
- name: jaeger
randomSamplingPercentage: 1.0 # 1% sampling in production
metrics:
- providers:
- name: prometheus
accessLogging:
- providers:
- name: envoy
# Query Prometheus metrics
kubectl port-forward svc/prometheus -n istio-system 9090:9090
# Useful PromQL for Istio metrics:
# Request rate by service
rate(istio_requests_total[5m])
# Error rate (5xx)
sum(rate(istio_requests_total{response_code=~"5.*"}[5m])) by (destination_service)
/
sum(rate(istio_requests_total[5m])) by (destination_service)
# P99 latency
histogram_quantile(0.99, rate(istio_request_duration_milliseconds_bucket[5m]))
Key Commands Reference
# Check Istio proxy status
istioctl proxy-status # All sidecars sync status
istioctl proxy-config all pod/my-pod # Full sidecar config
istioctl proxy-config route pod/my-pod # Route config
# Analyze config for issues
istioctl analyze # Analyze all namespaces
istioctl analyze -n production # Specific namespace
# Check mTLS status
istioctl authn tls-check pod/my-pod product-service.production.svc.cluster.local
# Describe routing for a service
istioctl describe service product-service.production
# View access logs (sidecar)
kubectl logs -n production deploy/product-service -c istio-proxy --tail=50
# Exec into sidecar for debugging
kubectl exec -it pod/my-pod -c istio-proxy -- curl -sS localhost:15000/config_dump
# Traffic stats
kubectl exec -it pod/my-pod -c istio-proxy -- curl localhost:15000/stats | grep upstream
Common Patterns
Pattern 1: Gradual Canary Rollout Script
#!/bin/bash
# Gradually shift traffic from v1 to v2
for weight in 10 25 50 75 100; do
echo "Routing ${weight}% to v2..."
kubectl patch virtualservice product-service -n production --type=json \
-p="[
{\"op\": \"replace\", \"path\": \"/spec/http/0/route/0/weight\", \"value\": $((100-weight))},
{\"op\": \"replace\", \"path\": \"/spec/http/0/route/1/weight\", \"value\": ${weight}}
]"
# Monitor error rate for 5 minutes
sleep 300
ERROR_RATE=$(kubectl exec -n istio-system deploy/prometheus -- \
curl -s "localhost:9090/api/v1/query?query=sum(rate(istio_requests_total{response_code=~'5.*',destination_service='product-service'}[5m]))/sum(rate(istio_requests_total{destination_service='product-service'}[5m]))" \
| jq '.data.result[0].value[1]' -r)
echo "Error rate: ${ERROR_RATE}"
if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
echo "ERROR RATE TOO HIGH - Rolling back"
kubectl patch virtualservice product-service -n production --type=json \
-p='[{"op":"replace","path":"/spec/http/0/route/0/weight","value":100},{"op":"replace","path":"/spec/http/0/route/1/weight","value":0}]'
exit 1
fi
done
echo "Canary complete - 100% on v2"
Pattern 2: JWT Authentication at the Mesh Level
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
name: jwt-auth
namespace: production
spec:
selector:
matchLabels:
app: api-service
jwtRules:
- issuer: "https://accounts.example.com"
jwksUri: "https://accounts.example.com/.well-known/jwks.json"
audiences: ["api.example.com"]
forwardOriginalToken: true
---
# Deny unauthenticated requests
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: require-jwt
namespace: production
spec:
selector:
matchLabels:
app: api-service
action: DENY
rules:
- from:
- source:
notRequestPrincipals: ["*"] # No valid JWT
Pattern 3: Rate Limiting with EnvoyFilter
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: filter-ratelimit
namespace: production
spec:
workloadSelector:
labels:
app: product-service
configPatches:
- applyTo: HTTP_FILTER
match:
context: SIDECAR_INBOUND
listener:
filterChain:
filter:
name: "envoy.filters.network.http_connection_manager"
patch:
operation: INSERT_BEFORE
value:
name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
value:
stat_prefix: http_local_rate_limiter
token_bucket:
max_tokens: 100
tokens_per_fill: 100
fill_interval: 60s
filter_enabled:
runtime_key: local_rate_limit_enabled
default_value:
numerator: 100
denominator: HUNDRED
Pitfalls to Avoid
Enabling strict mTLS before all services have sidecars: If you switch to
STRICTmTLS mesh-wide while some pods lack the Istio sidecar (e.g., system pods, legacy deployments), those services can no longer communicate. Always auditistioctl proxy-statusfirst, label namespaces incrementally, and usePERMISSIVEmode during migration before switching toSTRICT.VirtualService/DestinationRule host mismatches: The
hostin a DestinationRule must exactly match the Kubernetes service name (as it would appear in DNS:service-name,service-name.namespace, or the full FQDN). A mismatch silently causes rules to not apply — traffic works but without the circuit breaking, retries, or routing you configured. Useistioctl analyzeto catch these.Sidecar resource overhead: Each Envoy sidecar consumes ~50MB RAM and ~0.1 CPU. A cluster with 500 pods adds ~25GB RAM and 50 CPU cores just for sidecars. Right-size your cluster before enabling Istio mesh-wide. Use
Sidecarresources to restrict which services each proxy tracks — by default, Envoy tracks the entire service registry, which grows linearly with cluster size.
Related Skills
kubernetes-architect— Kubernetes foundations before adding service meshapi-gateway-design— Edge gateway (Istio handles internal; API gateway handles external)opentelemetry-instrumentation— Application-level tracing alongside Istio's infrastructure tracingchaos-engineering— Using Istio fault injection in chaos experiments
GitNexus Index
{
"skill": "service-mesh-istio",
"category": "devops",
"triggers": ["istio", "service mesh", "mtls kubernetes", "traffic management", "envoy proxy", "zero trust kubernetes", "canary istio"],
"outputs": ["PeerAuthentication", "VirtualService", "DestinationRule", "AuthorizationPolicy", "Gateway"],
"complexity": "high",
"tools": ["istio", "kubernetes", "envoy", "prometheus", "jaeger", "kiali"]
}