gcp-architect - SKILL.md Agent Skill

name: gcp-architect description: Expert in Google Cloud Platform architecture, services, and best practices. Use for GCP infrastructure design, deployment, and cloud-native applications. allowed-tools: Read, Write, Edit, Grep, Glob, Bash

GCP Solutions Architect Expert

Purpose

Provide expert guidance on Google Cloud Platform architecture, service selection, deployment strategies, and cloud-native application design on GCP.

When to Use This Skill

Designing GCP infrastructure
Selecting appropriate GCP services
Migrating applications to GCP
Implementing serverless on GCP
Setting up CI/CD with Cloud Build
BigQuery data warehousing
Kubernetes on GKE
Cost optimization

Core GCP Services

Compute

Compute Engine - VMs
Cloud Run - Serverless containers
Cloud Functions - Serverless functions
GKE - Google Kubernetes Engine
App Engine - PaaS

Storage

Cloud Storage - Object storage
Persistent Disk - Block storage
Filestore - File storage
Cloud SQL - Managed relational databases
Firestore - NoSQL document database
Bigtable - Wide-column NoSQL
Spanner - Global relational database

Networking

VPC - Virtual Private Cloud
Cloud Load Balancing - Global load balancing
Cloud CDN - Content delivery
Cloud DNS - Domain name system
Cloud Armor - DDoS protection

Data & Analytics

BigQuery - Data warehouse
Dataflow - Stream/batch processing
Pub/Sub - Messaging service
Data Fusion - ETL tool
Composer - Workflow orchestration (Airflow)

Security & Identity

IAM - Identity and Access Management
Secret Manager - Secrets storage
Cloud KMS - Key management
Identity Platform - User authentication

DevOps & Monitoring

Cloud Build - CI/CD
Cloud Monitoring - Ops metrics
Cloud Logging - Log management
Cloud Trace - Distributed tracing
Cloud Profiler - Performance profiling

Architecture Patterns

1. Serverless Web Application

Architecture:
┌──────────────────────────────────────────┐
│   Cloud CDN + Cloud Storage              │
│   (Static Frontend)                      │
└────────────┬─────────────────────────────┘
             │
┌────────────▼─────────────────────────────┐
│   Cloud Load Balancer                    │
└────────────┬─────────────────────────────┘
             │
┌────────────▼─────────────────────────────┐
│   Cloud Run (Containers)                 │
│   Auto-scaling Microservices             │
└────────────┬─────────────────────────────┘
             │
┌────────────▼─────────────────────────────┐
│   Cloud SQL + Firestore                  │
└──────────────────────────────────────────┘

Terraform Configuration:
```hcl
resource "google_cloud_run_service" "api" {
  name     = "api-service"
  location = "us-central1"

  template {
    spec {
      containers {
        image = "gcr.io/project-id/api:latest"

        resources {
          limits = {
            cpu    = "1000m"
            memory = "512Mi"
          }
        }

        env {
          name  = "DATABASE_URL"
          value = google_sql_database_instance.main.connection_name
        }

        env {
          name = "API_KEY"
          value_from {
            secret_key_ref {
              name = google_secret_manager_secret.api_key.secret_id
              key  = "latest"
            }
          }
        }

        ports {
          container_port = 8080
        }
      }

      service_account_name = google_service_account.api.email

      timeout_seconds = 300

      container_concurrency = 80
    }

    metadata {
      annotations = {
        "autoscaling.knative.dev/maxScale" = "100"
        "autoscaling.knative.dev/minScale" = "1"
        "run.googleapis.com/cloudsql-instances" = google_sql_database_instance.main.connection_name
      }
    }
  }

  traffic {
    percent         = 100
    latest_revision = true
  }
}

resource "google_cloud_run_service_iam_member" "public" {
  service  = google_cloud_run_service.api.name
  location = google_cloud_run_service.api.location
  role     = "roles/run.invoker"
  member   = "allUsers"
}

resource "google_sql_database_instance" "main" {
  name             = "main-instance"
  database_version = "POSTGRES_14"
  region           = "us-central1"

  settings {
    tier = "db-f1-micro"

    backup_configuration {
      enabled    = true
      start_time = "03:00"
    }

    ip_configuration {
      ipv4_enabled = false
      private_network = google_compute_network.vpc.id
    }

    database_flags {
      name  = "max_connections"
      value = "100"
    }
  }

  deletion_protection = true
}

resource "google_storage_bucket" "frontend" {
  name          = "my-app-frontend"
  location      = "US"
  force_destroy = false

  uniform_bucket_level_access = true

  website {
    main_page_suffix = "index.html"
    not_found_page   = "404.html"
  }

  cors {
    origin          = ["https://example.com"]
    method          = ["GET", "HEAD"]
    response_header = ["*"]
    max_age_seconds = 3600
  }
}

resource "google_compute_backend_bucket" "cdn" {
  name        = "cdn-backend"
  bucket_name = google_storage_bucket.frontend.name
  enable_cdn  = true

  cdn_policy {
    cache_mode        = "CACHE_ALL_STATIC"
    client_ttl        = 3600
    default_ttl       = 3600
    max_ttl           = 86400
    negative_caching  = true
    serve_while_stale = 86400
  }
}

2. GKE Microservices

Architecture:
┌──────────────────────────────────────────┐
│   Cloud Load Balancer (Ingress)         │
└────────────┬─────────────────────────────┘
             │
┌────────────▼─────────────────────────────┐
│   GKE Cluster                            │
│   ┌──────────┐  ┌──────────┐           │
│   │Service A │  │Service B │           │
│   └──────────┘  └──────────┘           │
└────────────┬─────────────────────────────┘
             │
┌────────────▼─────────────────────────────┐
│   Cloud SQL + Memorystore                │
└──────────────────────────────────────────┘

Kubernetes Manifests:
```yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
        version: v1
    spec:
      serviceAccountName: user-service-sa
      containers:
      - name: user-service
        image: gcr.io/project-id/user-service:v1.2.3
        ports:
        - containerPort: 8080
          protocol: TCP
        env:
        - name: DATABASE_HOST
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: host
        - name: REDIS_HOST
          value: "10.0.0.3:6379"
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: user-service
  namespace: production
spec:
  selector:
    app: user-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP

---
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: user-service-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: user-service
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

---
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: main-ingress
  namespace: production
  annotations:
    kubernetes.io/ingress.class: "gce"
    kubernetes.io/ingress.global-static-ip-name: "web-static-ip"
    networking.gke.io/managed-certificates: "web-ssl-cert"
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /users/*
        pathType: ImplementationSpecific
        backend:
          service:
            name: user-service
            port:
              number: 80

3. Data Pipeline with BigQuery

Architecture:
┌──────────────────────────────────────────┐
│   Data Sources (APIs, Databases, Files) │
└────────────┬─────────────────────────────┘
             │
┌────────────▼─────────────────────────────┐
│   Cloud Pub/Sub (Streaming)              │
└────────────┬─────────────────────────────┘
             │
┌────────────▼─────────────────────────────┐
│   Dataflow (Processing)                  │
└────────────┬─────────────────────────────┘
             │
┌────────────▼─────────────────────────────┐
│   BigQuery (Data Warehouse)              │
└────────────┬─────────────────────────────┘
             │
┌────────────▼─────────────────────────────┐
│   Looker/Data Studio (Visualization)     │
└──────────────────────────────────────────┘

Python Dataflow Pipeline:
```python
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io.gcp.bigquery import WriteToBigQuery

class TransformData(beam.DoFn):
    def process(self, element):
        import json
        data = json.loads(element)

        # Transform logic
        transformed = {
            'user_id': data['id'],
            'event_type': data['type'],
            'timestamp': data['timestamp'],
            'value': data.get('value', 0)
        }

        yield transformed

def run():
    options = PipelineOptions(
        project='my-project',
        runner='DataflowRunner',
        region='us-central1',
        temp_location='gs://my-bucket/temp',
        staging_location='gs://my-bucket/staging'
    )

    with beam.Pipeline(options=options) as pipeline:
        (
            pipeline
            | 'Read from Pub/Sub' >> beam.io.ReadFromPubSub(
                subscription='projects/my-project/subscriptions/my-sub'
              )
            | 'Transform' >> beam.ParDo(TransformData())
            | 'Write to BigQuery' >> WriteToBigQuery(
                'my-project:dataset.table',
                schema='user_id:STRING,event_type:STRING,timestamp:TIMESTAMP,value:FLOAT',
                create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
                write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
              )
        )

if __name__ == '__main__':
    run()

Security Best Practices

1. IAM Roles

resource "google_project_iam_custom_role" "app_role" {
  role_id     = "appRole"
  title       = "Application Role"
  description = "Custom role for application"
  permissions = [
    "storage.objects.get",
    "storage.objects.create",
    "cloudsql.instances.connect"
  ]
}

resource "google_service_account" "app" {
  account_id   = "app-service-account"
  display_name = "Application Service Account"
}

resource "google_project_iam_member" "app_binding" {
  project = var.project_id
  role    = google_project_iam_custom_role.app_role.id
  member  = "serviceAccount:${google_service_account.app.email}"
}

2. VPC Security

Private Google Access for API access without internet
Cloud NAT for outbound internet
Firewall rules for network security
VPC Service Controls for data exfiltration protection

3. Secret Management

# Create secret
gcloud secrets create api-key --data-file=./key.txt

# Access in Cloud Run
gcloud run deploy api \
  --image gcr.io/project/api \
  --set-secrets=API_KEY=api-key:latest

Cost Optimization

Committed Use Discounts - Up to 57% savings
Sustained Use Discounts - Automatic for Compute Engine
Preemptible VMs - Up to 80% savings
Cloud Storage Classes - Standard, Nearline, Coldline, Archive
BigQuery - Use clustering and partitioning

Monitoring

from google.cloud import monitoring_v3

client = monitoring_v3.MetricServiceClient()
project_name = f"projects/{project_id}"

series = monitoring_v3.TimeSeries()
series.metric.type = "custom.googleapis.com/my_metric"
series.resource.type = "gce_instance"

point = monitoring_v3.Point()
point.value.double_value = 3.14
now = time.time()
point.interval.end_time.seconds = int(now)

series.points = [point]
client.create_time_series(name=project_name, time_series=[series])

This skill ensures well-architected, secure GCP solutions optimized for performance and cost.