plan - SKILL.md Agent Skill

name: plan description: "Generate a physics analysis plan for a new or reinterpretation study. Use when planning an original analysis, a reinterpretation, or when asked to create an analysis plan. Triggers on: create a prd, write analysis plan, plan this analysis, spec out, plan this study." user-invocable: true

Analysis Plan Generator

Create a detailed physics analysis plan that is scientifically clear, actionable, and suitable for conversion into project.json milestones via /setup.

Important: Do NOT start implementing. Just create the plan.

The Job

Receive an analysis description from the researcher
Ask 3-5 essential clarifying questions (with lettered options)
Generate a structured analysis plan based on answers
Save to analysis-plan.md at the repo root

Step 1: Clarifying Questions

Ask only critical questions where the initial description is ambiguous. Focus on:

Scientific goal: What physics question does this address?
Dataset: What data or simulation is available?
Method: Is the approach known, or does it need to be designed?
Scope: Reimplementation, reinterpretation, or new analysis?
Success criteria: What does a successful outcome look like?

Format Questions Like This:

1. What is the primary scientific goal?
   A. Reproduce existing results from a paper
   B. Reinterpret existing results with a new method
   C. Perform an original search/measurement
   D. Other: [please specify]

2. What dataset will be used?
   A. Existing public dataset (specify which)
   B. Simulated data to be generated
   C. Real collision data (specify experiment)
   D. Other: [please specify]

3. What is the analysis method?
   A. Cut-based selection
   B. ML-based classifier
   C. Density estimation / anomaly detection
   D. Statistical fit / limit setting

This lets the researcher respond with "1B, 2A, 3C" for quick iteration.

Step 2: Analysis Plan Structure

1. Overview

Brief description of the physics goal and approach.

2. Scientific Goals

Specific, measurable objectives:

What observable or result will be produced?
What comparison or baseline exists?
What would constitute a successful reimplementation or new result?

3. Analysis Steps

Break the analysis into logical stages, each small enough for one Overwatcher iteration. For each step:

### Step N: [Title]
**Goal:** What this step produces
**Method:** How it will be done
**Success criterion:** How we know it worked (quantitative where possible)
**Depends on:** Which previous steps must be complete first

4. Dataset & Inputs

Data source (paper, public repository, simulation)
Format (ROOT, HDF5, CSV, etc.)
Expected size and key features

5. Key Observables

Which quantities will be computed
Expected distributions or values (from paper or physics intuition)

6. Statistical Treatment

How results will be quantified (significance, limits, comparison metrics)
Treatment of uncertainties (statistical, systematic)

7. Non-Goals

What this analysis will NOT do — important for keeping scope manageable.

8. Open Questions

Physics or technical questions that need to be resolved during the analysis.

Writing for the Overwatcher

The plan will be read by the Overwatcher agent and converted to project.json milestones. Therefore:

Be explicit and unambiguous — "apply mjj > 1000 GeV cut" not "select high-mass events"
Success criteria must be verifiable by the Tester agent
Steps should be independent enough to run in isolation as law tasks
Reference the paper (arXiv ID) wherever specific numbers come from

Output

Format: Markdown (.md)
Location: repo root
Filename: analysis-plan.md

Example

# Analysis Plan: CATHODE Reinterpretation with Supervised Classifier

## Overview

Reinterpret the CATHODE anomaly detection analysis (arXiv:2109.00546) by
replacing the unsupervised flow-based classifier with a supervised classifier
trained on a known BSM signal. Goal: compare sensitivity between the two approaches.

## Scientific Goals

- Reproduce the CATHODE baseline result (AUC, significance) within 5%
- Train a supervised classifier on a benchmark signal (Z' → qq, m=3 TeV)
- Compare ROC curves and expected significance between supervised and unsupervised

## Analysis Steps

### Step 1: Paper specification
**Goal:** Full methodology spec extracted from arXiv:2109.00546
**Method:** Paper Analyst reads paper and produces spec.md
**Success criterion:** spec.md covers dataset, signal region, flow architecture,
classifier setup, and all expected results from the paper
**Depends on:** Nothing

### Step 2: Data loading and exploration
**Goal:** Dataset loaded, key distributions reproduced
**Method:** Law task LoadData reads HDF5 files, plots mjj, pT, and substructure variables
**Success criterion:** mjj distribution matches paper Figure 1 qualitatively
**Depends on:** Step 1

### Step 3: Signal region selection
**Goal:** Event selection matching paper Section 2
**Method:** Law task SelectEvents applies mjj window and object cuts
**Success criterion:** Event yields within 10% of paper Table 1
**Depends on:** Step 2

...

## Dataset & Inputs

- Source: LHCO R&D dataset (zenodo.org/record/6466204)
- Format: HDF5, ~1M events, features: mjj, m1, m2, tau21_1, tau21_2
- Already available in source/

## Key Observables

- mjj: dijet invariant mass, signal region 3.3-3.7 TeV
- Classifier score distribution in signal vs sideband
- ROC AUC: expect ~0.8 for CATHODE baseline (paper Figure 3)

## Statistical Treatment

- Significance estimated via likelihood ratio test
- Systematic uncertainties: limited to statistical for this study
- Trial factor: single signal region, no LEE correction needed

## Non-Goals

- No real data — simulation only
- No full systematic uncertainty treatment
- No publication-quality plots (clarity over style)

## Open Questions

- Does the supervised classifier require signal injection during training, or is
  a pure signal sample sufficient?
- What signal cross-section assumption to use for sensitivity comparison?

Checklist

Before saving analysis-plan.md:

Asked clarifying questions with lettered options
Incorporated researcher's answers
Each step has a verifiable success criterion
Dataset and inputs are specified concretely
Non-goals define clear scope boundaries
Open questions are listed
Saved to analysis-plan.md