hypothesis-generation - SKILL.md Agent Skill

name: hypothesis-generation description: Generate testable hypotheses from data patterns and literature category: workflow

Hypothesis Generation

When to Use This Skill

After identifying an interesting pattern in the data
When a previous hypothesis was rejected (need to generate alternatives)
At the start of investigation (bootstrap from literature)
When you're stuck and need fresh ideas

The Process

1. Review Current Knowledge

What patterns have been observed?

Check the knowledge graph summary
What group differences exist?
What correlations are surprising?
What contradicts expectations?

What has been tested already?

Which hypotheses were supported?
Which were rejected? (Don't repeat these!)
What did negative results tell us?

What does literature say?

Search PubMed for relevant papers
Extract known mechanisms
Identify knowledge gaps

2. Formulate Specific, Testable Hypotheses

Good hypotheses have this structure: "X causes Y via mechanism Z"

Examples:

✅ Good: "Hypothermia increases nucleotide salvage flux by upregulating APRT enzyme activity"
❌ Too vague: "Metabolism changes in hypothermia"
❌ Not testable: "The brain adapts to cold"

Requirements:

Must be testable with available data
Must be falsifiable (can prove it wrong)
Should suggest a specific analysis
Should have mechanistic basis (not just correlation)

3. Prioritize Hypotheses

Score each hypothesis on:

Impact (1-5): How central to the research question?

5 = Directly explains the core phenotype
3 = Fills in a mechanistic detail
1 = Minor tangential observation

Feasibility (1-5): Can we test it with current data?

5 = Have all required variables
3 = Can construct proxy measure
1 = Missing critical data

Novelty (1-5): Is this a new insight?

5 = No one has asked this before
3 = Refinement of known mechanism
1 = Well-studied question

Coherence (1-5): Fits with existing findings?

5 = Explains contradictions or connects findings
3 = Extends current model
1 = Orthogonal to other findings

Total Priority Score = Impact × 0.4 + Feasibility × 0.3 + Novelty × 0.2 + Coherence × 0.1

Test highest-scoring hypotheses first.

4. Design the Test

For each hypothesis, specify:

What statistical test to use
What variables to compare
What result would support the hypothesis
What result would reject it

Example Workflow

Observation: "CMP is elevated in hypothermia (FC=1.8, p<0.01)"

Step 1: Search literature

search_pubmed("CMP metabolism nucleotide salvage")

Finds: CMP is product of nucleotide salvage pathway

Step 2: Generate hypotheses

H1: "Salvage flux is increased" (upstream cause)
H2: "CMP→CDP conversion is blocked" (downstream bottleneck)
H3: "CMP degradation is reduced" (clearance issue)

Step 3: Prioritize

H1: Impact=4, Feasibility=4, Novelty=3, Coherence=4 → Score=3.8
H2: Impact=5, Feasibility=5, Novelty=4, Coherence=4 → Score=4.6 ⭐
H3: Impact=3, Feasibility=2, Novelty=2, Coherence=3 → Score=2.6

Step 4: Test H2 first Calculate CDP-Choline Synthesis Index = CMP / CDP-Choline Compare across groups

Common Pitfalls to Avoid

❌ Don't repeat rejected hypotheses

Check the knowledge graph for what's already been ruled out

❌ Don't cherry-pick

Test hypotheses systematically, not just ones likely to succeed

❌ Don't ignore negative results

Failed hypotheses are valuable - they constrain the solution space

❌ Don't generate untestable hypotheses

If you can't test it with current data, it's not useful now

When to Stop

You have enough hypotheses when:

You have 2-3 high-priority (score >4.0) hypotheses ready to test
You've covered the main alternative explanations
Further brainstorming is giving diminishing returns

Don't generate 20 hypotheses - focus on quality over quantity.