spark-optimization

star 444

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

Dokhacgiakhoa By Dokhacgiakhoa schedule Updated 2/11/2026

version: 4.1.0-fractal name: spark-optimization description: Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

Apache Spark Optimization

Production patterns for optimizing Apache Spark jobs including partitioning strategies, memory management, shuffle optimization, and performance tuning.

Do not use this skill when

  • The task is unrelated to apache spark optimization
  • You need a different domain or tool outside this scope

Instructions

  • Clarify goals, constraints, and required inputs.
  • Apply relevant best practices and validate outcomes.
  • Provide actionable steps and verification.
  • If detailed examples are required, open resources/implementation-playbook.md.

Use this skill when

  • Optimizing slow Spark jobs
  • Tuning memory and executor configuration
  • Implementing efficient partitioning strategies
  • Debugging Spark performance issues
  • Scaling Spark pipelines for large datasets
  • Reducing shuffle and data skew

Core Concepts

🧠 Knowledge Modules (Fractal Skills)

1. 1. Spark Execution Model

2. 2. Key Performance Factors

3. Pattern 1: Optimal Partitioning

4. Pattern 2: Join Optimization

5. Pattern 3: Caching and Persistence

6. Pattern 4: Memory Tuning

7. Pattern 5: Shuffle Optimization

8. Pattern 6: Data Format Optimization

9. Pattern 7: Monitoring and Debugging

10. Do's

11. Don'ts

Install via CLI
npx skills add https://github.com/Dokhacgiakhoa/antigravity-ide --skill spark-optimization
Repository Details
star Stars 444
call_split Forks 137
navigation Branch main
article Path SKILL.md
More from Creator
Dokhacgiakhoa
Dokhacgiakhoa Explore all skills →