name: study-eval description: > Core framework for evaluating research papers against a structured factor glossary. Supports any domain via DomainConfig JSON files. Use when setting up a new evaluation domain or understanding the framework. user-invocable: false
Study Evaluation Framework — Core Architecture
This skill provides background knowledge about the domain-agnostic study evaluation framework. It is not directly invocable — use domain-specific skills (e.g., study-eval-neuro, study-eval-electronics) for actual evaluations.
Overview
The framework evaluates research papers by scoring them against a structured glossary of factors, across one or more contexts, organized into theory groups. Everything domain-specific is defined in a DomainConfig JSON file.
Key Concepts
DomainConfig
A single JSON file that fully parameterizes the framework for a specific research domain. See domain-config-schema.md for the full schema.
Contexts
Independent evaluation dimensions (e.g., "Local Oddball" and "Global Oddball" in neuroscience, or "High Frequency" and "Low Frequency" in electronics). Each paper is scored separately for each context.
Theory Groups
Logical groupings of factors (e.g., H1/H2/H3 hypotheses in neuroscience). Used for:
- Organizing the glossary display
- Computing group averages
- Axis labels in visualizations (3D scatter plots)
Glossary Factors
Individual measurable/scorable items within the domain. Each factor has:
id: Numeric identifierdef: Human-readable definitionrel: Relationship keywords (evidence cues)tag: "Quantitative", "Qualitative", or "Methodological"modes: Which contexts the factor applies to (e.g.,["LO", "GO"])theory_group: Which theory group this factor belongs to
Scoring Scale
Configurable per domain. Default: -1.0 to +1.0 with np.nan for not-addressed.
File Structure
study-eval/
├── domains/
│ └── <domain_id>.json # DomainConfig files
├── core/
│ ├── config.py # DomainConfig loader + validator
│ ├── columns.py # Column naming utilities
│ ├── prompts.py # Prompt template generation
│ ├── evaluation.py # AI evaluation pipeline
│ ├── dashboard.py # Interactive dashboard
│ └── visualization.py # Plotting functions
└── notebooks/
└── generic_eval_demo.ipynb # Demo notebook
Creating a New Domain
- Copy an existing domain config (e.g.,
domains/neuroscience_predictive_coding.json) - Modify all fields: domain info, contexts, theory_groups, study_types, glossary
- Create a corresponding skill in
.claude/skills/study-eval-<domain>/ - The framework automatically adapts column naming, prompts, dashboard, and visualizations
Column Naming Convention
Factor columns follow the pattern: {context.column_prefix}_{sanitized_factor_name}
Sanitization: spaces → underscores, parentheses removed.
Example: Factor "Subtractive Inhibition (SST)" in context "Local Oddball" (prefix: Local_Oddball) → Local_Oddball_Subtractive_Inhibition_SST
Visualization Dispatch
The framework selects visualization type based on theory group count:
- 1 group → 1D strip plot
- 2 groups → 2D scatter
- 3 groups → 3D scatter (default for neuroscience)
- 4+ groups → Radar/spider chart