tech-diagram-generator - SKILL.md Agent Skill

name: tech-diagram-generator description: Generate professional technical diagrams (Mermaid, ASCII, flowcharts, architecture diagrams) for AI/ML articles. Supports neural network architectures, data pipelines, algorithm flows, and system architectures.

Technical Diagram Generator

This skill helps you create professional, publication-quality technical diagrams for AI/ML technical articles using Mermaid, ASCII art, and structured formats.

When to Use This Skill

Creating neural network architecture diagrams
Visualizing data pipelines and ETL flows
Documenting algorithm flows and decision trees
Designing system architecture diagrams
Explaining transformer attention mechanisms
Showing training/inference workflows
Creating comparison tables and matrices

Supported Diagram Types

1. Mermaid Diagrams

graph TB
    Input[Input Data] --> Preprocess[Preprocessing]
    Preprocess --> Model[Neural Network]
    Model --> Output[Predictions]

2. ASCII Art Diagrams

+------------------+     +------------------+
|   Input Layer    | --> |   Hidden Layer   |
|   (784 neurons)  |     |   (256 neurons)  |
+------------------+     +------------------+
                                |
                                v
                         +------------------+
                         |   Output Layer   |
                         |   (10 neurons)   |
                         +------------------+

3. Architecture Blocks

┌─────────────────────────────────────────┐
│            Transformer Block            │
├─────────────────────────────────────────┤
│  ┌─────────────┐    ┌─────────────┐    │
│  │Multi-Head   │    │  Feed       │    │
│  │Attention    │───>│  Forward    │    │
│  └─────────────┘    └─────────────┘    │
│        │                   │            │
│        └───────┬───────────┘            │
│                ▼                        │
│        ┌─────────────┐                 │
│        │Layer Norm   │                 │
│        └─────────────┘                 │
└─────────────────────────────────────────┘

Instructions

For Neural Network Architectures

When asked to visualize a neural network:

## Network Architecture: [Name]

### Overview
[Brief description of the architecture]

### Architecture Diagram

```mermaid
graph TB
    subgraph Input
        X[Input Features<br/>shape: (batch, seq, dim)]
    end

    subgraph Encoder
        E1[Embedding Layer]
        E2[Positional Encoding]
        E3[Multi-Head Attention]
        E4[Layer Norm + FFN]
    end

    subgraph Output
        O1[Linear Projection]
        O2[Softmax]
        Y[Predictions]
    end

    X --> E1 --> E2 --> E3 --> E4 --> O1 --> O2 --> Y

Layer Details

Layer	Input Shape	Output Shape	Parameters
Embedding	(B, L)	(B, L, D)	V × D
Attention	(B, L, D)	(B, L, D)	4D²
FFN	(B, L, D)	(B, L, D)	2D × 4D

Key Components

Embedding Layer: Maps tokens to dense vectors
Positional Encoding: Adds position information
Multi-Head Attention: Computes attention weights
Feed-Forward Network: Non-linear transformation


### For Data Pipelines

When asked to visualize a data pipeline:

```markdown
## Data Pipeline: [Name]

### Pipeline Flow

┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Raw Data │───>│ Validation │───>│ Transform │ │ (S3/GCS) │ │ & Clean │ │ & Feature │ └──────────────┘ └──────────────┘ └──────────────┘ │ ▼ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Serving │<───│ Training │<───│ Data │ │ Layer │ │ Pipeline │ │ Splitting │ └──────────────┘ └──────────────┘ └──────────────┘


### Processing Steps

| Stage | Operation | Tools | Output |
|-------|-----------|-------|--------|
| Ingestion | Read from source | Spark, Beam | Raw DataFrame |
| Validation | Schema check | Great Expectations | Cleaned DataFrame |
| Transform | Feature engineering | Pandas, PySpark | Feature Matrix |
| Split | Train/Val/Test | Scikit-learn | Split Data |

For Algorithm Flows

When asked to visualize an algorithm:

## Algorithm: [Name]

### Flowchart

```mermaid
flowchart TD
    A[Start] --> B{Condition?}
    B -->|Yes| C[Action A]
    B -->|No| D[Action B]
    C --> E[Process]
    D --> E
    E --> F{Converged?}
    F -->|No| B
    F -->|Yes| G[End]

Pseudocode

algorithm AlgorithmName(input):
    # Step 1: Initialization
    initialize parameters

    # Step 2: Main Loop
    while not converged:
        # Step 2.1: Compute
        result = compute(input)

        # Step 2.2: Update
        parameters = update(result)

        # Step 2.3: Check
        if convergence_criteria():
            break

    return output

Complexity Analysis

Metric	Value
Time Complexity	O(n log n)
Space Complexity	O(n)
Best Case	O(n)
Worst Case	O(n²)


### For Attention Mechanisms

When asked to explain attention:

```markdown
## Attention Mechanism

### Scaled Dot-Product Attention

       Query (Q)
          │
          ▼

┌─────────────────────────────┐ │ │ │ Q × K^T │ │ │ │ │ ▼ │ │ / √d_k (Scale) │ ←── Key (K) │ │ │ │ ▼ │ │ Softmax │ │ │ │ │ ▼ │ │ × V (Weighted Sum) │ ←── Value (V) │ │ │ │ ▼ │ │ Output │ │ │ └─────────────────────────────┘


### Multi-Head Attention

```mermaid
graph LR
    Q[Query] --> H1[Head 1]
    Q --> H2[Head 2]
    Q --> H3[Head 3]
    Q --> Hn[Head N]

    K[Key] --> H1
    K --> H2
    K --> H3
    K --> Hn

    V[Value] --> H1
    V --> H2
    V --> H3
    V --> Hn

    H1 --> C[Concat]
    H2 --> C
    H3 --> C
    Hn --> C

    C --> L[Linear]
    L --> O[Output]

Mathematical Formulation

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$

Where:

$Q \in \mathbb{R}^{n \times d_k}$ (Query matrix)
$K \in \mathbb{R}^{n \times d_k}$ (Key matrix)
$V \in \mathbb{R}^{n \times d_v}$ (Value matrix)
$d_k$ = dimension of keys


### For Training Workflows

When asked to show training process:

```markdown
## Training Workflow

### Training Loop

```mermaid
sequenceDiagram
    participant D as DataLoader
    participant M as Model
    participant L as Loss Function
    participant O as Optimizer

    loop For each epoch
        D->>M: Forward pass (batch)
        M->>L: Compute loss
        L->>O: Backward pass
        O->>M: Update weights
    end

    Note over M: Model converged!

Training Pipeline

┌────────────────────────────────────────────────────────┐
│                    Training Loop                        │
├────────────────────────────────────────────────────────┤
│                                                        │
│   ┌─────────┐   ┌─────────┐   ┌─────────┐            │
│   │  Batch  │──>│ Forward │──>│  Loss   │            │
│   │  Data   │   │  Pass   │   │  Calc   │            │
│   └─────────┘   └─────────┘   └─────────┘            │
│                                     │                  │
│                                     ▼                  │
│   ┌─────────┐   ┌─────────┐   ┌─────────┐            │
│   │ Update  │<──│Gradient │<──│Backward │            │
│   │ Weights │   │  Calc   │   │  Pass   │            │
│   └─────────┘   └─────────┘   └─────────┘            │
│                                                        │
└────────────────────────────────────────────────────────┘

Hyperparameters

Parameter	Value	Description
Learning Rate	1e-4	Step size for gradient descent
Batch Size	32	Samples per gradient update
Epochs	100	Full dataset iterations
Optimizer	AdamW	Adaptive learning rate
Scheduler	Cosine	LR decay strategy


### For System Architectures

When asked to show system design:

```markdown
## System Architecture: [Name]

### High-Level Architecture

```mermaid
graph TB
    subgraph Client
        UI[Web UI]
        API[API Client]
    end

    subgraph Load Balancer
        LB[Nginx/ALB]
    end

    subgraph Application
        API1[API Server 1]
        API2[API Server 2]
    end

    subgraph Services
        Auth[Auth Service]
        ML[ML Service]
        Cache[Redis Cache]
    end

    subgraph Data
        DB[(PostgreSQL)]
        S3[(S3 Storage)]
    end

    UI --> LB
    API --> LB
    LB --> API1
    LB --> API2
    API1 --> Auth
    API2 --> ML
    ML --> Cache
    Auth --> DB
    ML --> S3

Component Details

Component	Technology	Purpose
Frontend	React	User interface
API Gateway	FastAPI	Request routing
Auth	JWT	Authentication
ML Service	PyTorch	Model inference
Cache	Redis	Response caching
Database	PostgreSQL	Data persistence
Storage	S3	Model artifacts


## Best Practices

### Diagram Design Principles

1. **Clarity First**: Diagrams should be immediately understandable
2. **Consistent Styling**: Use consistent colors, shapes, and fonts
3. **Logical Flow**: Information should flow naturally (top-to-bottom or left-to-right)
4. **Appropriate Detail**: Show enough detail without overwhelming
5. **Clear Labels**: Every component should be labeled

### Color Coding

┌─────────────────────────────────────────┐ │ Color Coding Guide │ ├─────────────────────────────────────────┤ │ 🔵 Blue - Input/Output layers │ │ 🟢 Green - Processing/Transform │ │ 🟡 Yellow - Decision points │ │ 🔴 Red - Error/Warning states │ │ 🟣 Purple - Model components │ │ ⚪ Gray - Infrastructure │ └─────────────────────────────────────────┘


### Shape Conventions

┌─────────────┐ Rectangle: Process/Data │ Process │ └─────────────┘

  ◇          Diamond: Decision
/   \

/
◇───────◇

(Circle) Circle: Start/End

[/Parallel] Parallelogram: I/O


## Quick Reference

### Mermaid Syntax Cheat Sheet

graph TB # Top-bottom flowchart graph LR # Left-right flowchart flowchart TD # Flowchart with decisions sequenceDiagram # Sequence diagram classDiagram # UML class diagram stateDiagram # State machine pie # Pie chart gantt # Gantt chart


### Common Patterns

**Neural Network Layer**:
```mermaid
graph LR
    Input[Input<br/>B×L×D] --> Linear[Linear<br/>D→H]
    Linear --> Activation[ReLU/GELU]
    Activation --> Output[Output<br/>B×L×H]

Attention Block:

graph TB
    X[Input] --> QKV[Q, K, K Projection]
    QKV --> Attn[Scaled Dot-Product]
    Attn --> Proj[Output Projection]
    Proj --> Add[Residual Add]
    Add --> Out[Output]

Data Flow:

graph LR
    A[Source] -->|Read| B[Transform]
    B -->|Process| C[Validate]
    C -->|Write| D[Sink]

Examples

Example 1: Transformer Architecture

## Transformer Architecture

### Complete Architecture

```mermaid
graph TB
    subgraph Input
        Tok[Tokenization]
        Pos[Positional Encoding]
    end

    subgraph Encoder
        E_MHA[Multi-Head Attention]
        E_FF[Feed Forward]
        E_N1[Layer Norm]
        E_N2[Layer Norm]
    end

    subgraph Decoder
        D_MHA1[Masked Attention]
        D_MHA2[Cross Attention]
        D_FF[Feed Forward]
        D_N1[Layer Norm]
        D_N2[Layer Norm]
        D_N3[Layer Norm]
    end

    subgraph Output
        Linear[Linear]
        Softmax[Softmax]
    end

    Tok --> Pos --> E_MHA --> E_N1 --> E_FF --> E_N2
    E_N2 --> D_MHA2
    Pos --> D_MHA1 --> D_N1 --> D_MHA2 --> D_N2 --> D_FF --> D_N3 --> Linear --> Softmax


### Example 2: Diffusion Model

```markdown
## Diffusion Model Architecture

### Forward & Reverse Process

    Forward Process (Add Noise)
    ───────────────────────────>

x₀ ──► x₁ ──► x₂ ──► ... ──► xₜ ──► ... ──► x_T 🖼️ 🌫️ 🌫️ 🌫️ 📺

    <───────────────────────────
    Reverse Process (Denoise)

    x₀ ←── x₁ ←── x₂ ←── ... ←── xₜ ←── ... ←── x_T
    🖼️     🌫️     🌫️              🌫️           📺
          │      │               │
          ▼      ▼               ▼
        ε̂₁    ε̂₂    ...       ε̂ₜ
        (Predicted Noise)


### U-Net Denoiser

```mermaid
graph TB
    subgraph Encoder
        E1[Conv 64]
        E2[Conv 128]
        E3[Conv 256]
        E4[Conv 512]
    end

    subgraph Bottleneck
        B[Attention Blocks]
    end

    subgraph Decoder
        D4[UpConv 256]
        D3[UpConv 128]
        D2[UpConv 64]
        D1[Output Conv]
    end

    x_t --> E1 --> E2 --> E3 --> E4 --> B
    B --> D4 --> D3 --> D2 --> D1 --> ε̂

    E1 -.-> D2
    E2 -.-> D3
    E3 -.-> D4


## Tips for AI/ML Diagrams

1. **Show Tensor Shapes**: Always include dimensions (B, L, D)
2. **Mark Trainable Parameters**: Use different colors for learnable vs fixed
3. **Highlight Data Flow**: Use arrows to show information flow
4. **Include Mathematical Notation**: Add equations where helpful
5. **Show Batch Dimension**: Always include B for batch
6. **Use Subgraphs**: Group related components together