question-generation - SKILL.md Agent Skill

name: question-generation description: Generate research questions from structural gaps to guide knowledge exploration. Use when user asks what to explore next, needs research direction, or requires gap-bridging guidance.

Question Generation Skill

Purpose: Auto-generate research questions from structural gaps to guide knowledge exploration

Backend: backend/services/question-generator.cjs (264 lines)

Status: ✅ Operational (implemented Phase 2)

What This Skill Does

Generates targeted research questions from detected gaps in knowledge graphs, helping users explore connections and fill structural holes.

Key Capabilities:

Template-based question generation (4 types)
Bridge concept suggestions
Question deduplication and ranking
Output formatting (simple/detailed/markdown)
Gap metadata integration

When to Use This Skill

Use question generation when:

User asks "what should I explore next?"
Graph analysis reveals structural gaps
Need to guide deeper investigation
Planning research directions
Prioritizing knowledge expansion

Question Template System

4 Question Types (by Gap Type)

Structural Questions (disconnected clusters):

"How do {clusterA} and {clusterB} relate?"
"What connects {conceptsA} with {conceptsB}?"
"What is the intermediate concept between {keyA} and {keyB}?"
"Why might {keyA} influence {keyB}?"
"What bridges the gap between {topicA} and {topicB}?"

Topical Questions (topic mismatch):

"How does {clusterA} impact {clusterB}?"
"What role does {keyA} play in {topicB}?"
"How can {conceptsA} enhance {conceptsB}?"
"What insights from {topicA} apply to {topicB}?"
"Where do {clusterA} and {clusterB} overlap?"

Semantic Questions (meaning gap):

"What semantic relationship exists between {keyA} and {keyB}?"
"How are {conceptsA} conceptually similar to {conceptsB}?"
"What shared meaning connects {topicA} and {topicB}?"
"What underlying principle unites {clusterA} and {clusterB}?"
"How can we reframe {keyA} in terms of {keyB}?"

Logical Questions (missing causal links):

"What logical steps connect {keyA} to {keyB}?"
"What causes the relationship between {topicA} and {topicB}?"
"What are the implications of connecting {clusterA} with {clusterB}?"
"How does {keyA} lead to {keyB}?"
"What conditions allow {conceptsA} to influence {conceptsB}?"

Template Variables

Cluster-Level:

{clusterA} / {clusterB} - Cluster IDs (e.g., 0, 1, 2)
{topicA} / {topicB} - Cluster topics (if labeled)

Concept-Level:

{conceptsA} / {conceptsB} - Top 3 concepts from each cluster (comma-separated)
{keyA} / {keyB} - Highest centrality concept from each cluster

Example Substitution:

Template: "What logical steps connect {keyA} to {keyB}?"
Variables: { keyA: "revenue", keyB: "management" }
Output: "What logical steps connect revenue to management?"

Question Output Structure

{
  question: "What logical steps connect revenue to management?",
  gap: {
    from: 0,        // Cluster A ID
    to: 3,          // Cluster B ID
    severity: "major",
    score: 0.805
  },
  type: "logical",
  bridgeConcepts: ["relationship", "connection"]
}

Bridge Concept Suggestions

Heuristic Rules:

Look for substring overlaps between concepts from each cluster
Suggest concept combinations (e.g., "revenue-metrics")
Fallback to generic bridges if no specific matches

Generic Bridge Concepts:

relationship
connection
impact
influence
correlation
dependency

Example:

Cluster A: ["revenue", "projections", "cost"]
Cluster B: ["management", "strategy", "decisions"]

Suggested bridges: ["relationship", "connection"]

Generation Options

{
  maxQuestionsPerGap: 3,      // Questions generated per gap
  maxTotalQuestions: 15,      // Total questions across all gaps
  minGapSeverity: 'minor'     // Only generate for minor+ gaps
}

Defaults:

3 questions per gap
15 total questions maximum
Minor severity threshold (excludes "bridged" gaps)

Question Ranking

Priority Order:

Gap severity (critical > major > minor)
Gap score (higher scores prioritized)

Deduplication:

Normalizes questions to lowercase
Removes exact duplicates
Case-insensitive matching

Output Formats

Simple Format

[
  "What logical steps connect revenue to management?",
  "How do cluster 0 and cluster 3 relate?",
  "What causes the relationship between cluster 0 and cluster 2?"
]

Detailed Format

[
  {
    question: "What logical steps connect revenue to management?",
    gap: "0 → 3",
    severity: "major",
    score: "0.805",
    type: "logical",
    suggestedBridges: "relationship, connection"
  }
]

Markdown Format

1. **What logical steps connect revenue to management?**
   - Gap: 0 → 3 (major)
   - Suggested bridges: relationship, connection

2. **How do cluster 0 and cluster 2 relate?**
   - Gap: 0 → 2 (minor)
   - Suggested bridges: relationship, connection

Integration Points

Called by:

backend/services/agent-graph-service.cjs:145-151 (automatic on every analysis)

Depends on:

Enhanced gaps from gap-scorer.cjs
Cluster concepts from community detection
Gap metadata (severity, type, score)

Outputs:

Research questions array
Question summary statistics
Formatted question strings

Performance

Speed: <2ms overhead per analysis
Questions Generated: 3-15 per analysis
Cost: $0 (template-based, no API calls)

No regression: Analysis time 16-76ms maintained from Phase 1

Testing

# Test question generation
node -e "
const { analyzeForAgent } = require('./backend/services/agent-graph-service.cjs');
(async () => {
  const result = await analyzeForAgent('finance', 'revenue projections cost structure burn rate management metrics');
  console.log('Questions generated:', result.researchQuestions.length);
  console.log('');
  result.researchQuestions.forEach((q, i) => {
    console.log(\`\${i+1}. \${q.question}\`);
    console.log(\`   Gap: \${q.gap.from} → \${q.gap.to} (\${q.gap.severity})\`);
  });
})();
"

# Expected output:
# Questions generated: 9
#
# 1. What logical steps connect revenue to management?
#    Gap: 0 → 3 (major)
# 2. What causes the relationship between cluster 0 and cluster 3?
#    Gap: 0 → 3 (major)
# ...

Question Summary Statistics

Provided fields:

total - Total questions generated
byType - Breakdown by structural/topical/semantic/logical
bySeverity - Breakdown by critical/major/minor
topQuestions - Top 5 most important questions

Example:

{
  total: 9,
  byType: { structural: 0, topical: 0, semantic: 0, logical: 9 },
  bySeverity: { critical: 0, major: 3, minor: 6 },
  topQuestions: [
    "What logical steps connect revenue to management?",
    "How do cluster 0 and cluster 3 relate?",
    ...
  ]
}

Real-World Examples

@finance (Financial Analysis):

9 questions generated from 3 gaps:

1. What logical steps connect revenue to management?
   Gap: 0 → 3 (major, score=0.805)
   Bridges: relationship, connection

2. What causes the relationship between cluster 0 and cluster 3?
   Gap: 0 → 3 (major)

3. What are the implications of connecting cluster 0 with cluster 3?
   Gap: 0 → 3 (major)

4. What logical steps connect revenue to forecasting?
   Gap: 0 → 2 (minor, score=0.786)

... (9 total)

@marketing (Campaign Strategy):

3 questions generated from 1 gap:

1. What logical steps connect product to conversion?
   Gap: 0 → 1 (major, score=0.702)
   Bridges: relationship, connection

2. What causes the relationship between cluster 0 and cluster 1?
   Gap: 0 → 1 (major)

3. What are the implications of connecting cluster 0 with cluster 1?
   Gap: 0 → 1 (major)

@seo (Search Optimization):

3 questions generated from 1 gap:

1. What semantic relationship exists between keywords and performance?
   Gap: 1 → 2 (minor, score=0.654)
   Bridges: relationship, connection

2. How are keyword, research conceptually similar to performance, rankings?
   Gap: 1 → 2 (minor)

3. What shared meaning connects cluster 1 and cluster 2?
   Gap: 1 → 2 (minor)

Algorithmic Details

Concept Extraction:

Top 3 concepts per cluster (by centrality or order)
Fallback to all concepts if fewer than 3 available
Empty clusters skipped

Template Filling:

Regex replacement: \{variableName\} → actual value
All placeholders replaced in single pass
No nested template support

Deduplication Algorithm:

Lowercase + trim normalization
Set-based duplicate detection
Preserves first occurrence

Related Skills

Gap Scoring - Provides gap metadata for question generation
Evolution Tracking - Questions inform next-stage recommendations
AI Enhancement - AI insights complement generated questions

Limitations

Current constraints:

Template-based only (no AI generation)
Maximum 5 templates per gap type
Generic bridge concepts when no specific matches
No context-aware question filtering

Future enhancements (not implemented):

AI-powered custom question generation
Domain-specific template libraries
Bridge concept prediction via embeddings
Question quality scoring

Implementation: Phase 2 (2025-11-06) Test Status: ✅ Verified operational across @marketing, @seo, @finance Documentation: workspace/docs/Obsidian-v2/daily/2025-11-06-PHASE2-COMPLETE.md