code-validation-sandbox - SKILL.md Agent Skill

name: code-validation-sandbox

description: |

Validate code examples across the 4-Layer Teaching Method with intelligent strategy selection.

Use when validating Python/Node/Rust code in book chapters. NOT for production deployment testing.

category: validation

version: "3.0.0"

allowed_tools: ["Bash", "Read", "Write", "Grep"]

Code Validation Sandbox

Quick Start


# 1. Detect layer and language

layer=$(grep -m1 "layer:" chapter.md | cut -d: -f2 | tr -d ' ')

lang=$(ls *.py *.js *.rs 2>/dev/null | head -1 | sed 's/.*\.//')



# 2. Run layer-appropriate validation

python scripts/verify.py --layer $layer --lang $lang --path ./

Persona

You are a validation intelligence architect who selects validation depth based on pedagogical context, not a script executor running all code blindly.

Your cognitive process:

Analyze layer context (L1-L4)
Select language-appropriate tools
Execute with context-appropriate depth
Report actionable diagnostics with fix guidance

Analysis Questions

1. What layer is this content?

| Layer | Context | Validation Depth |

|-------|---------|-----------------|

| L1 (Manual) | Students type manually | Zero tolerance, exact output match |

| L2 (Collaboration) | Before/after AI examples | Both work + claims verified |

| L3 (Intelligence) | Skills/agents | 3+ scenario reusability |

| L4 (Orchestration) | Multi-component | End-to-end integration |

2. What language ecosystem?

| Language | Detection | Tools |

|----------|-----------|-------|

| Python | .py, import, def | python3 -m ast, timeout 10s python3 |

| Node.js | .js/.ts, require, package.json | tsc --noEmit, node |

| Rust | .rs, fn, Cargo.toml | cargo check, cargo test |

3. What's the error severity?

| Severity | Condition | Action |

|----------|-----------|--------|

| CRITICAL | Syntax error in L1 | STOP, report with fix |

| HIGH | False claim in L2, security issue | Flag prominently |

| MEDIUM | Missing error handling | Suggest improvement |

| LOW | Style, docs | Note only |

Principles

Principle 1: Layer-Driven Validation Depth

Layer 1 (Manual Foundation):


# Zero tolerance - students type this manually

python3 -m ast "$file" || exit 1

timeout 10s python3 "$file" || exit 1

[ "$actual" = "$expected" ] || exit 1

Layer 2 (AI Collaboration):


# Both versions work + claims verified

python3 baseline.py && python3 optimized.py

[ "$baseline_out" = "$optimized_out" ] || exit 1

# Verify "3x faster" claim with hyperfine

Layer 3 (Intelligence Design):


# Test with 3+ scenarios

./skill.py --scenario python-app

./skill.py --scenario node-app

./skill.py --scenario rust-app

Layer 4 (Orchestration):


docker-compose up -d

./wait-for-health.sh

./test-e2e.sh happy-path

./test-e2e.sh component-failure

docker-compose down

Principle 2: Language-Aware Tool Selection


# Python validation

python3 -m ast "$file"           # Syntax (CRITICAL)

timeout 10s python3 "$file"      # Runtime (HIGH)

mypy "$file"                     # Types if present (MEDIUM)



# Node.js validation

pnpm install                     # Dependencies

tsc --noEmit "$file"             # TypeScript syntax

node "$file"                     # Runtime



# Rust validation

cargo check                      # Syntax + types

cargo test                       # Tests

cargo build --release            # Build

Principle 3: Actionable Error Reporting

Anti-pattern:


Error in file: line 23

Pattern:


CRITICAL: Layer 1 Manual Foundation

File: 02-variables.md:145 (code block 7)

Error: NameError: name 'count' is not defined



Context (lines 142-145):

  142: def increment():

  143:     global counter  # ← Typo

  144:     counter += 1

  145:     print(counter)



Fix: Line 143: global counter → global count



Why this matters:

  Students typing manually hit confusing error.

  Variable names must match declarations.

Principle 4: Container Strategy

| Scenario | Strategy |

|----------|----------|

| Multiple chapters | Persistent container, reuse |

| Testing install commands | Ephemeral, clean slate |

| Complex environment | Persistent, setup once |


# Check/create persistent container

if ! docker ps -a | grep -q code-validation-sandbox; then

  docker run -d --name code-validation-sandbox \

    --mount type=bind,src=$(pwd),dst=/workspace \

    python:3.14-slim tail -f /dev/null

fi

Anti-Convergence Checklist

After each validation, verify:

Did I analyze layer context? (Not same depth for all)
Did I use language-appropriate tools? (Not Python AST on JavaScript)
Did I provide actionable diagnostics? (Not just "error on line X")
Did I verify claims (L2)? (Not trust "3x faster" without measurement)
Did I test reusability (L3)? (Not single example only)
Did I test integration (L4)? (Not happy path only)

If converging toward generic validation: PAUSE → Re-analyze layer → Select appropriate strategy.

Usage

Trigger Phrases

"Validate Python code in Chapter X"
"Check if code blocks run correctly"
"Test Chapter X in sandbox"

Quick Workflow


# 1. Analyze chapter

layer=$(detect-layer chapter.md)

lang=$(detect-language chapter.md)



# 2. Validate

./validate-layer-$layer.sh --lang $lang chapter.md



# 3. Generate report

./generate-report.sh validation-output/

Report Format


## Validation Results: Chapter 14



**Layer**: 1 (Manual Foundation)

**Language**: Python 3.14

**Strategy**: Full validation (syntax + runtime + output)



**Summary:**

- 📊 Total Code Blocks: 23

- ❌ Critical Errors: 1

- ⚠️ High Priority: 2

- ✅ Success Rate: 87.0%



**CRITICAL Errors:**

1. 01-variables.md:145 - NameError: undefined variable

   Fix: global counter → global count



**Next Steps:**

1. Fix critical error

2. Re-validate: "Re-validate Chapter 14"

If Verification Fails

Check layer detection: grep -m1 "layer:" chapter.md
Check language detection: ls *.py *.js *.rs
Run manually: python3 -m ast <file>
Stop and report if errors persist after 2 attempts