hypothesis-generation - SKILL.md Agent Skill

name: hypothesis-generation description: Formulate research hypotheses using structured frameworks. Use when developing research questions, designing experiments, or planning studies with testable predictions.

Hypothesis Generation

Structured frameworks for developing research hypotheses and experimental designs.

When to Use

Starting a new research project
Developing research questions
Planning experiments
Generating testable predictions
Exploring competing explanations

Hypothesis Framework

Good Hypothesis Characteristics

A strong research hypothesis should be:

Specific: Clear, precise statement
Testable: Can be validated with data
Falsifiable: Can potentially be proven wrong
Grounded: Based on prior knowledge/theory
Novel: Adds something new to the field

Hypothesis Types

Type	Description	Example
Descriptive	Describes a phenomenon	"LLMs exhibit X behavior on task Y"
Relational	Proposes relationship	"Factor A correlates with outcome B"
Causal	Claims causation	"Intervention X causes improvement Y"
Comparative	Compares conditions	"Method A outperforms method B on task C"
Mechanistic	Explains how/why	"Effect X occurs because of mechanism Y"

Hypothesis Development Process

Step 1: Identify the Gap

From your literature review, identify:

What is known
What is unknown or unclear
What is contradictory

Document the gap:

## Research Gap
**Known**: [What prior work has established]
**Unknown**: [What remains to be discovered]
**Our Focus**: [Which unknown we address]

Step 2: Generate Initial Hypotheses

Use these prompts:

"If [assumption] is true, then we should observe [prediction]"
"Based on [theory/observation], we expect [outcome]"
"Contrary to [current belief], we propose [alternative]"

Generate multiple hypotheses (aim for 3-5 initially).

Step 3: Develop Competing Hypotheses

For each hypothesis, identify:

Alternative explanations: What else could explain the same observation?
Null hypothesis: What if there's no effect?
Opposite hypothesis: What if the effect is reversed?

Step 4: Operationalize

Convert abstract hypothesis to concrete, measurable terms:

Abstract	Operationalized
"LLMs understand X"	"GPT-4 achieves >80% accuracy on benchmark Y"
"Method A is better"	"Method A improves F1 by >5% over baseline B"
"Training affects X"	"Models trained with X show Y behavior increase"

Step 5: Design Tests

For each hypothesis, define:

Data: What data is needed?
Method: How will you test?
Metrics: What measures success/failure?
Threshold: What counts as support/rejection?

Competing Hypotheses Framework

Template

## Research Question
[Your main question]

### Hypothesis 1: [Name]
**Statement**: [Formal hypothesis]
**Rationale**: [Why this might be true]
**Prediction**: [What we expect to observe]
**Test**: [How to test]

### Hypothesis 2: [Alternative]
**Statement**: [Formal hypothesis]
**Rationale**: [Why this might be true]
**Prediction**: [What we expect to observe]
**Test**: [How to test]

### Hypothesis 3: [Null]
**Statement**: There is no significant effect
**Prediction**: No difference between conditions
**Test**: Statistical significance testing

### Decision Matrix
| Outcome | Supports H1 | Supports H2 | Supports H3 |
|---------|-------------|-------------|-------------|
| [Result A] | Yes | No | No |
| [Result B] | No | Yes | No |
| [Result C] | No | No | Yes |

Experimental Design

Variables

Type	Definition	Example
Independent (IV)	What you manipulate	Model type, training data
Dependent (DV)	What you measure	Accuracy, F1, latency
Controlled	Held constant	Prompt template, temperature
Confounding	Could affect DV	Data contamination, model size

Design Types

Between-subjects: Different conditions get different treatments

Pros: No carryover effects
Cons: Need more samples, individual differences

Within-subjects: Same subject gets all treatments

Pros: Controls individual differences
Cons: Order effects, fatigue

Factorial: Multiple IVs crossed

Pros: Tests interactions
Cons: More conditions needed

Control Strategies

Baseline comparison: Compare against known baseline
Ablation study: Remove components to test necessity
Randomization: Random assignment to conditions
Counterbalancing: Vary order across subjects/trials

Prediction Documentation

Template for Each Hypothesis

## Hypothesis: [Name]

### Formal Statement
[If X, then Y under conditions Z]

### Background
[Why we think this might be true]

### Predictions

#### Primary Prediction
- **Measure**: [What to measure]
- **Expected outcome**: [Specific prediction]
- **Threshold for support**: [Quantitative criterion]

#### Secondary Predictions
1. [Additional prediction 1]
2. [Additional prediction 2]

### Potential Confounds
- [Confound 1]: [How to address]
- [Confound 2]: [How to address]

### What Would Falsify This?
[Specific outcomes that would reject hypothesis]

Common Pitfalls

Avoid These

Vague hypotheses: "Method A is good" → "Method A achieves >X on benchmark Y"
Unfalsifiable claims: "LLMs may sometimes..." → "LLMs will show X in condition Y"
Post-hoc hypothesizing: Generating hypothesis after seeing data
Confirmation bias: Only looking for supporting evidence
Missing null hypothesis: Not considering "no effect" possibility

Warning Signs

Hypothesis can explain any outcome
No clear way to measure
Based on single observation
Ignores contradictory evidence
No alternative hypotheses considered

Quality Checklist

Hypothesis is specific and clear
Hypothesis is testable with available resources
Hypothesis is falsifiable
Hypothesis is grounded in prior work
Alternative hypotheses identified
Null hypothesis specified
Variables operationalized
Confounds identified and addressed
Success/failure criteria defined
Predictions documented before experimentation

References

See references/ folder for:

hypothesis_templates.md: Templates for different research types