study-eval - SKILL.md Agent Skill

name: study-eval description: > Core framework for evaluating research papers against a structured factor glossary. Supports any domain via DomainConfig JSON files. Use when setting up a new evaluation domain or understanding the framework. user-invocable: false

Study Evaluation Framework — Core Architecture

This skill provides background knowledge about the domain-agnostic study evaluation framework. It is not directly invocable — use domain-specific skills (e.g., study-eval-neuro, study-eval-electronics) for actual evaluations.

Overview

The framework evaluates research papers by scoring them against a structured glossary of factors, across one or more contexts, organized into theory groups. Everything domain-specific is defined in a DomainConfig JSON file.

Key Concepts

DomainConfig

A single JSON file that fully parameterizes the framework for a specific research domain. See domain-config-schema.md for the full schema.

Contexts

Independent evaluation dimensions (e.g., "Local Oddball" and "Global Oddball" in neuroscience, or "High Frequency" and "Low Frequency" in electronics). Each paper is scored separately for each context.

Theory Groups

Logical groupings of factors (e.g., H1/H2/H3 hypotheses in neuroscience). Used for:

Organizing the glossary display
Computing group averages
Axis labels in visualizations (3D scatter plots)

Glossary Factors

Individual measurable/scorable items within the domain. Each factor has:

id: Numeric identifier
def: Human-readable definition
rel: Relationship keywords (evidence cues)
tag: "Quantitative", "Qualitative", or "Methodological"
modes: Which contexts the factor applies to (e.g., ["LO", "GO"])
theory_group: Which theory group this factor belongs to

Scoring Scale

Configurable per domain. Default: -1.0 to +1.0 with np.nan for not-addressed.

File Structure

study-eval/
├── domains/
│   └── <domain_id>.json          # DomainConfig files
├── core/
│   ├── config.py                  # DomainConfig loader + validator
│   ├── columns.py                 # Column naming utilities
│   ├── prompts.py                 # Prompt template generation
│   ├── evaluation.py              # AI evaluation pipeline
│   ├── dashboard.py               # Interactive dashboard
│   └── visualization.py           # Plotting functions
└── notebooks/
    └── generic_eval_demo.ipynb    # Demo notebook

Creating a New Domain

Copy an existing domain config (e.g., domains/neuroscience_predictive_coding.json)
Modify all fields: domain info, contexts, theory_groups, study_types, glossary
Create a corresponding skill in .claude/skills/study-eval-<domain>/
The framework automatically adapts column naming, prompts, dashboard, and visualizations

Column Naming Convention

Factor columns follow the pattern: {context.column_prefix}_{sanitized_factor_name}

Sanitization: spaces → underscores, parentheses removed.

Example: Factor "Subtractive Inhibition (SST)" in context "Local Oddball" (prefix: Local_Oddball) → Local_Oddball_Subtractive_Inhibition_SST

Visualization Dispatch

The framework selects visualization type based on theory group count:

1 group → 1D strip plot
2 groups → 2D scatter
3 groups → 3D scatter (default for neuroscience)
4+ groups → Radar/spider chart