skeleton-of-thought - SKILL.md Agent Skill

name: skeleton-of-thought description: "Parallel generation through skeleton-first approach for 2x speedup" trigger: "keyword" keywords: ["fast", "parallel generate", "speed up", "quick response"] priority: 4

Skeleton-of-Thought (SoT) Skill

Achieves up to 2x speedup through parallel content generation.

Core Concept

Instead of sequential generation, SoT:

First generates a skeleton (outline) of the answer
Then expands each skeleton point in PARALLEL
Finally assembles the complete response

When to Use

Long-form responses (>500 words expected)
Structured content (lists, tutorials, explanations)
Time-sensitive queries
Batch processing scenarios

When NOT to Use

Math/reasoning problems (need sequential thought)
Simple factual queries
Creative writing requiring flow
Code generation requiring context

Process

Step 1: Skeleton Generation

Given question, generate ONLY the skeleton outline:
- Point 1: [brief description]
- Point 2: [brief description]
- Point 3: [brief description]
...
Do NOT expand. Just the skeleton.

Step 2: Parallel Expansion

Launch parallel expansions for each point:

Point 1 → Agent 1 → Expanded content
Point 2 → Agent 2 → Expanded content
Point 3 → Agent 3 → Expanded content
(all run simultaneously)

Step 3: Assembly

Combine expanded points with transitions:
[Introduction]
[Point 1 expanded]
[Transition]
[Point 2 expanded]
[Transition]
[Point 3 expanded]
[Conclusion]

Implementation

# Skeleton generation prompt
SKELETON_PROMPT = """
For the question: {question}

Generate ONLY a skeleton outline with 3-8 key points.
Format:
1. [Point]: [5-10 word description]
2. [Point]: [5-10 word description]
...

Do NOT expand the points. ONLY the skeleton.
"""

# Point expansion prompt
EXPAND_PROMPT = """
Context: Answering "{question}"
Skeleton: {skeleton}

Expand ONLY point {point_number}: "{point_description}"

Write 2-4 sentences expanding this point.
Do not include other points.
"""

Performance

Query Type	Sequential Time	SoT Time	Speedup
Tutorial	10s	5s	2.0x
Explanation	8s	4.5s	1.8x
List-based	12s	6s	2.0x
Analysis	15s	9s	1.7x

Quality Preservation

Skeleton ensures coherent structure
Each expansion has full question context
Assembly adds smooth transitions
Quality remains high despite parallelism

Combination with Other Skills

Self-Consistency: Generate multiple skeletons, vote on best
ToT: Use ToT for skeleton generation, SoT for expansion
LATS: Apply LATS scoring to skeleton options

Reference: "Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding" (2023)