name: data-analysis-workflow description: Orchestrate practical data science work from vague request to verified analysis, report, or handoff. Use when Codex needs to plan or execute an end-to-end data analysis, decide which data science sub-skill to use, turn a stakeholder question into an analysis workflow, review an analysis for completeness, or coordinate EDA, modeling, interpretation, communication, and reproducibility.
Data Analysis Workflow
Use this as the router and control skill for data science tasks. Keep the work question-first, evidence-driven, and reproducible.
Domain Context Carry-Forward
The Domain Context Contract from domain-problem-interviewer-researcher is required context for every downstream stage when the work starts from a business, domain, operational, or stakeholder problem. Carry it into question framing, EDA, modeling, mathematical method selection, and reporting. If no contract exists, create one or explicitly state why it is not needed. Update the contract when data evidence or research changes the understanding.
Every stage must adapt choices to the contract: unit of analysis, target/KPI, operational constraints, stakeholder decision, prohibited claims, domain terminology, and success metric.
Workflow
- State the decision or question in plain language.
- If the business problem, domain, stakeholder, or success criteria are unclear, use
domain-problem-interviewer-researcherbefore touching data and require itsDomain Context Contract. - Classify the question as descriptive, exploratory, inferential, predictive, causal, mechanistic, or mixed using the domain contract.
- Inventory the available data, code, constraints, and expected output artifact.
- Run the smallest deterministic data checks before modeling.
- Choose the analysis lane:
- Use
domain-problem-interviewer-researcherwhen the request lacks business context, domain understanding, stakeholder goals, operating constraints, or when web/domain research is needed before planning. - Use
analytic-question-framingwhen the question is vague or mismatched to the data. - Use
data-checking-edabefore trusting any dataset, join, summary, or plot. - Use
modeling-strategy-reviewbefore fitting, changing, or interpreting models. - Use
reproducible-analysis-reportingbefore handing off notebooks, reports, slides, or scripts. - Use
mathematical-data-science-foundationsfor high-dimensional, graph, sketching, clustering, topic model, SVD/PCA, or algorithmic foundations questions.
- Use
- Record assumptions, checks run, failed checks, and residual risk.
- Prefer a narrow verified slice over a broad unverified analysis.
Default Deliverables
- A crisp analytic question.
- A business/domain brief and
Domain Context Contractwhen the request starts from a need, opportunity, risk, or broad stakeholder query. - Dataset and code inventory.
- Data checking evidence.
- Analysis method and why it matches the question type.
- Results with uncertainty and caveats.
- Reproducibility notes: inputs, commands, environment, seeds, and generated outputs.
Guardrails
- Do not fit a model before checking row counts, column semantics, missingness, units, duplicates, and target leakage risk.
- Do not use causal language for exploratory, inferential, or predictive analyses without a causal design.
- Do not claim reproducibility from a notebook that has not been rerun from a clean state.
- Do not optimize for complex methods when a direct plot, table, baseline, or simple model answers the question.