name: problem-classifier description: Classify each subquestion into standard mathematical modeling problem types. license: MIT
Purpose
Classify each parsed subquestion into standard mathematical modeling problem types.
This skill converts a validated problem parse into a problem-type map. It identifies the core task type of each subquestion, explains the classification reason, lists candidate method families at a high level, and flags common misclassification risks.
This skill does not select the final model, generate code, clean data, or write paper sections.
When to use
Use this skill:
- After
problem-parserhas produced a validated problem parse. - Before
related-paper-analyzerandmethod-selector. - When subquestions need to be mapped to standard mathematical modeling types.
- When a problem appears mixed and the team needs to separate evaluation, prediction, optimization, mechanism, classification, graph, simulation, or hybrid components.
- When the team is tempted to choose a model before clarifying the task type.
Preconditions
A validated problem parse should exist and include:
- subquestions
- goals
- objects
- constraints
- data inventory
- required outputs
- preliminary variables or relationships if available
- ambiguity and risk notes
If the problem parse is missing or incomplete, hand back to problem-parser.
Inputs
Use or request:
workspace/problem/problem-parser/problem_parse.json, if available.- Parsed subquestions from
problem-parser. - Goals, objects, constraints, data, and required outputs.
- Preliminary variables, controllable quantities, observable quantities, and relationships.
- Ambiguity and risk flags from the parsing stage.
- User notes about contest constraints or team implementation limits, if relevant.
Workflow
Read the parsed subquestions.
- Classify each subquestion separately.
- Do not assign one coarse type to the whole problem unless the problem truly has only one task.
Identify the main verb and expected output.
- Evaluation tasks often ask to evaluate, rank, compare, score, classify quality, or assess risk.
- Prediction tasks often ask to forecast, estimate, infer future values, fill unknown trends, or quantify uncertainty.
- Optimization tasks often ask to choose, allocate, schedule, route, maximize, minimize, or design a feasible plan.
- Mechanism tasks often ask to explain system behavior through equations, physical rules, dynamic processes, or causal structure.
- Classification or clustering tasks often ask to group, label, segment, identify categories, or detect abnormal samples.
- Graph or routing tasks often involve nodes, edges, paths, networks, flows, connectivity, matching, or traversal.
- Simulation tasks often require scenario generation, stochastic processes, agent behavior, Monte Carlo trials, or repeated process imitation.
- Hybrid tasks combine multiple task types and should be decomposed by subquestion or modeling layer.
Suggest the primary type — but do not finalize the framing.
- Emit the AI's pick as
ai_suggested_typewith anai_suggestion_confidence, keeping it clearly the AI's suggestion. - Use the required output as the main decision criterion and explain the reasoning, so the human has something concrete to ratify.
- Prefer the type that determines how results will be judged.
- The primary problem-type label is load-bearing framing — it steers
method-selector. It is the human's call, not the AI's. Leavemodeler_chosen_typeandframing_rationaleas[MODELER INPUT NEEDED: ...]sentinels for the modeler to author. Do NOT pre-fill them by copyingai_suggested_type.
- Emit the AI's pick as
Determine the secondary type if needed.
- Add a secondary type only when it materially affects method selection.
- Do not list many secondary types just to be comprehensive.
Identify candidate method families.
- List broad method families, not final models.
- Keep candidates tied to the classified task type.
- Mark candidate methods as tentative and subject to
method-selector.
Flag unsuitable or risky directions.
- Mark methods that appear tempting but are unsupported by data, output requirements, interpretability needs, or contest constraints.
- Flag deep learning, black-box models, or complex heuristics when data volume, explainability, or implementation time is insufficient.
Produce a classification artifact.
- Keep the output structured and concise.
- Preserve ambiguity instead of forcing false certainty.
- Save the paired outputs under
workspace/problem/problem-classifier/. - Recommend
related-paper-analyzeras the next skill if classification is complete.
Outputs
Produce a problem classification artifact as paired outputs:
workspace/problem/problem-classifier/problem_classification.jsonworkspace/problem/problem-classifier/problem_classification.md
The artifacts should contain:
classification_summarysubquestion_classificationsglobal_structurecandidate_method_familiesunsuitable_directionsclassification_risksambiguitiesrecommended_next_skill
Output format
Prefer this JSON-compatible structure for workspace/problem/problem-classifier/problem_classification.json:
{
"classification_summary": {
"overall_pattern": "hybrid",
"reason": "The problem combines evaluation, prediction, and optimization across different subquestions."
},
"subquestion_classifications": [
{
"id": "Q1",
"ai_suggested_type": "evaluation",
"ai_suggestion_confidence": "high",
"modeler_chosen_type": "[MODELER INPUT NEEDED: confirm or override the primary type — the AI suggests 'evaluation']",
"framing_rationale": "[MODELER INPUT NEEDED: which type, and why this framing over the alternative — e.g. Q2 is evaluation not prediction because the graded output is a ranking, not a future value]",
"secondary_type": "data_analysis",
"classification_reason": "The subquestion asks for comparable scores or rankings based on multiple indicators.",
"output_driver": [
"ranking",
"score",
"comparative explanation"
],
"candidate_method_families": [
"indicator system construction",
"weighting methods",
"multi-criteria evaluation",
"dimensionality reduction"
],
"unsuitable_directions": [
{
"direction": "time series forecasting",
"reason": "The required output is a ranking or score, not a future value."
}
],
"risk_flags": [
"weight source must be justified",
"positive and negative indicators must be normalized consistently"
]
}
],
"global_structure": {
"is_hybrid": true,
"dependency_pattern": [
{
"from": "Q1",
"to": "Q2",
"reason": "Q2 may use Q1 scores as explanatory variables or inputs."
}
],
"suggested_workflow_order": [
"Q1",
"Q2",
"Q3"
]
},
"classification_risks": [
"Do not force the whole problem into one type if subquestions differ."
],
"ambiguities": [
"Ambiguity inherited from the problem parse, if any."
],
"recommended_next_skill": "related-paper-analyzer"
}
Also produce workspace/problem/problem-classifier/problem_classification.md with the same fields in readable Markdown form.
B-layer human-confirmation field (load-bearing framing). The primary problem-type label steers method selection, so the AI suggests but the human decides. For every subquestion:
ai_suggested_type— the AI's pick (the oldprimary_type). The AI authors this, withai_suggestion_confidenceand aclassification_reason.modeler_chosen_type— the confirmed primary type. Emit it as[MODELER INPUT NEEDED: confirm or override the primary type — the AI suggests '<ai_suggested_type>']. The AI must NOT fill this in.framing_rationale— emit as[MODELER INPUT NEEDED: which type, and why this framing over the alternative — e.g. Q2 is evaluation not prediction because…]. The AI must NOT fill this in.
A surviving [MODELER INPUT NEEDED (or [AI-DRAFT) sentinel in a finalized classification artifact is a Gate G1 FAIL — completeness-auditor already treats these sentinels as "not done", exactly like the C-layer <<<HUMAN>>> decision sentinel. The human must replace both sentinels before classification is "ready". Carry ai_suggested_type forward in every example below; never replace it with a finalized primary_type.
Standard problem types
Use these labels consistently:
evaluationpredictionoptimizationmechanismclassification-clusteringgraph-routingsimulationdata-analysishybrid
Classification guide
Evaluation
Use when the main output is:
- score
- ranking
- comparison
- grade
- risk level
- comprehensive assessment
- priority order
Common cues:
- evaluate
- rank
- compare
- assess
- measure quality
- determine importance
- build an index system
Common risks:
- indicators are not justified
- positive and negative indicators are mixed incorrectly
- weights are arbitrary
- repeated indicators inflate importance
- ranking lacks sensitivity analysis
Candidate method families:
- indicator system construction
- normalization
- weighting
- multi-criteria decision analysis
- dimensionality reduction
- grey relational analysis
- fuzzy evaluation
Prediction
Use when the main output is:
- future value
- unknown value estimate
- trend
- forecast interval
- error or uncertainty description
Common cues:
- predict
- forecast
- estimate
- infer future
- trend
- extrapolate
- fill unknown values
Common risks:
- training fit is mistaken for future generalization
- no test or validation split
- no error metric
- no explanation of failure cases
- long-term extrapolation is overclaimed
Candidate method families:
- regression
- time series analysis
- grey prediction
- machine learning regression
- ensemble prediction
- uncertainty estimation
Optimization
Use when the main output is:
- decision plan
- allocation scheme
- schedule
- route
- maximum or minimum value
- feasible strategy under constraints
Common cues:
- choose
- allocate
- schedule
- assign
- route
- maximize
- minimize
- optimize
- design a plan
Common risks:
- decision variables are not separated from state variables
- objective function is vague
- constraints are incomplete
- feasibility is not checked
- final result gives a number but not an implementable plan
Candidate method families:
- linear programming
- integer programming
- nonlinear programming
- dynamic programming
- multi-objective optimization
- heuristic search
- network optimization
Mechanism
Use when the main output is:
- equation-based explanation
- dynamic process model
- causal structure
- physical or biological mechanism
- interpretable system evolution
Common cues:
- explain mechanism
- describe process
- derive relationship
- simulate dynamics from rules
- model spread, motion, flow, growth, or decay
Common risks:
- assumptions are too strong
- parameters have no source
- units are inconsistent
- model is not validated against data or common sense
Candidate method families:
- differential equations
- difference equations
- compartment models
- conservation laws
- physical constraints
- system dynamics
Classification-clustering
Use when the main output is:
- category label
- group assignment
- segment
- anomaly label
- sample type
Common cues:
- classify
- cluster
- group
- identify category
- segment
- detect abnormal samples
Common risks:
- class labels are unclear
- number of clusters is arbitrary
- feature scaling is ignored
- validation metric is missing
Candidate method families:
- supervised classification
- unsupervised clustering
- anomaly detection
- feature engineering
- dimensionality reduction
Graph-routing
Use when the main output is:
- path
- network structure
- connectivity
- route plan
- matching
- flow allocation
- node importance
Common cues:
- network
- path
- route
- nodes
- edges
- connection
- shortest path
- flow
- matching
Common risks:
- graph abstraction loses important constraints
- edge weights are not justified
- route feasibility is not checked
- static network assumptions hide dynamic constraints
Candidate method families:
- shortest path
- minimum spanning tree
- network flow
- matching
- vehicle routing
- graph centrality
- graph search
Simulation
Use when the main output is:
- scenario result
- stochastic outcome distribution
- repeated trial behavior
- process imitation
- policy comparison under uncertainty
Common cues:
- simulate
- scenario
- random
- probability distribution
- Monte Carlo
- dynamic process
- agent behavior
Common risks:
- random seed is not fixed
- number of trials is insufficient
- simulation assumptions are hidden
- outputs are not statistically summarized
Candidate method families:
- Monte Carlo simulation
- discrete-event simulation
- agent-based simulation
- stochastic process modeling
- scenario analysis
Data-analysis
Use when the main output is:
- descriptive finding
- correlation pattern
- feature relationship
- distribution summary
- data insight before modeling
Common cues:
- analyze data
- describe pattern
- explore relationship
- identify influencing factors
- summarize characteristics
Common risks:
- correlation is overstated as causation
- figures are decorative rather than explanatory
- no connection to later modeling steps
Candidate method families:
- descriptive statistics
- correlation analysis
- hypothesis testing
- exploratory data analysis
- factor analysis
Hybrid
Use when:
- different subquestions require different task types
- one stage produces inputs for another stage
- the solution combines evaluation, prediction, optimization, simulation, or mechanism modeling
Common risks:
- treating the whole problem as one type
- skipping dependencies between subquestions
- using one model to answer all tasks poorly
Candidate method families:
- staged modeling workflow
- modular pipeline
- evaluation-then-prediction
- prediction-then-optimization
- mechanism-plus-data fitting
- simulation-plus-optimization
Rules
- Classify by subquestion, not by the entire problem title.
- Use the required output to determine the primary type.
- The primary problem-type label is load-bearing framing that steers
method-selector. The AI suggests it (ai_suggested_type+ confidence + reason); the human owns the final framing. Emitmodeler_chosen_typeandframing_rationaleas[MODELER INPUT NEEDED: ...]sentinels and let the modeler author them. - Do not author, pre-fill, or copy
ai_suggested_typeintomodeler_chosen_typeorframing_rationaleon the human's behalf. - A surviving
[MODELER INPUT NEEDEDor[AI-DRAFTsentinel in the finalized classification artifact is a Gate G1 FAIL — treat it exactly like the C-layer<<<HUMAN>>>sentinel; classification is not "ready" until the human replaces it. - Do not select the final model.
- Do not generate code.
- Do not write paper text.
- Do not over-list candidate methods.
- Do not recommend complex or black-box methods without data and interpretability support.
- Do not hide classification uncertainty.
- Do not override ambiguities inherited from
problem-parser. - Do not move directly to
model-code-analyzer.
Verification
Before handing off, verify:
- Every parsed subquestion has an
ai_suggested_type(the AI's suggested primary type) with a confidence and reason. - Every parsed subquestion still carries the
[MODELER INPUT NEEDED: ...]sentinels formodeler_chosen_typeandframing_rationale— the AI must not have authored or copied them. A surviving sentinel is a Gate G1 FAIL the human must clear (it confirms the framing has not yet been ratified); the AI hands off with these sentinels intact, not pre-filled. - Secondary types are used only when necessary.
- Classification reasons refer to task wording, required output, and data conditions.
- Candidate method families are broad, not final model choices.
- Unsuitable directions and classification risks are listed.
- Hybrid structure and dependencies are identified if present.
- The next skill is
related-paper-analyzer.
Failure modes
Stop and report a blocker if:
- No validated problem parse exists.
- Subquestions are missing or too vague to classify.
- Required outputs are unknown.
- A key attachment or data description is needed to determine the task type.
- The user asks for final model selection before classification is complete.
Stop conditions
This skill must stop instead of guessing when:
- Multiple classifications would lead to materially different modeling workflows.
- The task wording is too incomplete to infer the required output.
- The classification depends on unavailable data fields or missing contest requirements.
- Continuing would require inventing the user's intent.
When stopping, output:
- the blocker
- why it matters
- the minimum information needed
- the partial classifications that are still safe
- the recommended next action
Handoff
After producing a validated classification artifact, hand off to:
related-paper-analyzer
The handoff should include:
- each subquestion ID
ai_suggested_type(the AI's suggested primary type) plus themodeler_chosen_type/framing_rationalestatus (confirmed by the human, or still a[MODELER INPUT NEEDED]sentinel)- secondary type if needed
- classification reason
- candidate method families
- unsuitable directions
- risk flags
- dependency pattern
Do not hand off to method-selector directly unless literature analysis has already been completed or intentionally skipped.
Examples
Example 1: Evaluation, prediction, optimization chain
Input state:
- Q1 asks to rank cities by resilience.
- Q2 asks to forecast next-year demand.
- Q3 asks to allocate limited resources.
Output:
{
"classification_summary": {
"overall_pattern": "hybrid",
"reason": "The problem combines evaluation, prediction, and optimization in sequence."
},
"subquestion_classifications": [
{
"id": "Q1",
"ai_suggested_type": "evaluation",
"ai_suggestion_confidence": "high",
"modeler_chosen_type": "[MODELER INPUT NEEDED: confirm or override the primary type — the AI suggests 'evaluation']",
"framing_rationale": "[MODELER INPUT NEEDED: which type, and why this framing over the alternative — e.g. Q1 is evaluation not prediction because the graded output is a ranking, not a future value]",
"secondary_type": "data-analysis",
"classification_reason": "Q1 asks for city ranking based on multiple indicators.",
"candidate_method_families": [
"indicator system construction",
"weighting methods",
"multi-criteria evaluation"
],
"risk_flags": [
"indicator selection and weight source must be justified"
]
},
{
"id": "Q2",
"ai_suggested_type": "prediction",
"ai_suggestion_confidence": "high",
"modeler_chosen_type": "[MODELER INPUT NEEDED: confirm or override the primary type — the AI suggests 'prediction']",
"framing_rationale": "[MODELER INPUT NEEDED: which type, and why this framing over the alternative — e.g. Q2 is prediction not evaluation because the graded output is a future demand value, not a ranking]",
"secondary_type": "data-analysis",
"classification_reason": "Q2 asks for future demand estimates.",
"candidate_method_families": [
"regression",
"time series analysis",
"ensemble prediction"
],
"risk_flags": [
"prediction error and generalization must be checked"
]
},
{
"id": "Q3",
"ai_suggested_type": "optimization",
"ai_suggestion_confidence": "high",
"modeler_chosen_type": "[MODELER INPUT NEEDED: confirm or override the primary type — the AI suggests 'optimization']",
"framing_rationale": "[MODELER INPUT NEEDED: which type, and why this framing over the alternative — e.g. Q3 is optimization not evaluation because the graded output is an allocation plan under constraints, not a score]",
"secondary_type": "multi-objective decision",
"classification_reason": "Q3 asks for an allocation plan under limited resources.",
"candidate_method_families": [
"linear programming",
"integer programming",
"multi-objective optimization"
],
"risk_flags": [
"decision variables, objective function, and constraints must be explicit"
]
}
],
"recommended_next_skill": "related-paper-analyzer"
}
Example 2: Avoid whole-problem overclassification
Input state:
- The title suggests prediction.
- Q1 asks for data cleaning and correlation analysis.
- Q2 asks for future sales prediction.
- Q3 asks for pricing optimization.
Output:
{
"classification_summary": {
"overall_pattern": "hybrid",
"reason": "The title is prediction-oriented, but the subquestions require data analysis, prediction, and optimization."
},
"subquestion_classifications": [
{
"id": "Q1",
"ai_suggested_type": "data-analysis",
"ai_suggestion_confidence": "medium",
"modeler_chosen_type": "[MODELER INPUT NEEDED: confirm or override the primary type — the AI suggests 'data-analysis']",
"framing_rationale": "[MODELER INPUT NEEDED: which type, and why this framing over the alternative — e.g. Q1 is data-analysis not prediction because the graded output is influencing-factor relationships, not future values]",
"classification_reason": "Q1 asks for relationships and influencing factors rather than future values."
},
{
"id": "Q2",
"ai_suggested_type": "prediction",
"ai_suggestion_confidence": "high",
"modeler_chosen_type": "[MODELER INPUT NEEDED: confirm or override the primary type — the AI suggests 'prediction']",
"framing_rationale": "[MODELER INPUT NEEDED: which type, and why this framing over the alternative — e.g. Q2 is prediction not data-analysis because the graded output is future sales values]",
"classification_reason": "Q2 asks for future sales estimates."
},
{
"id": "Q3",
"ai_suggested_type": "optimization",
"ai_suggestion_confidence": "high",
"modeler_chosen_type": "[MODELER INPUT NEEDED: confirm or override the primary type — the AI suggests 'optimization']",
"framing_rationale": "[MODELER INPUT NEEDED: which type, and why this framing over the alternative — e.g. Q3 is optimization not prediction because the graded output is a pricing decision under constraints]",
"classification_reason": "Q3 asks for a pricing decision."
}
],
"recommended_next_skill": "related-paper-analyzer"
}
Example 3: Blocked classification
Input state:
- Q2 asks to “evaluate the system.”
- The evaluation target and output form are not specified.
- No attachment fields are available.
Output:
{
"blocked_items": [
"The evaluation target and required output form are unclear."
],
"partial_classifications": [],
"missing_information": [
"What object is being evaluated",
"Whether the required output is a score, ranking, grade, or recommendation",
"Available indicator fields"
],
"recommended_next_action": "Return to problem-parser or ask the user for the missing output requirements."
}