name: deepseek-25-onboarding description: Use this skill when adding a new middle-school problem, especially a question 25, to the Strategy Planner + DeepSeek solver pipeline from either a problem image or an existing lesson-spec problem. The agent creates canonical ProblemIR, expected answers, DeepSeek integration tests, recorded executable StepIntent fixtures, few-shot projection, and any reusable method/family/recipe/binding/normalizer code needed to make the tests pass.
DeepSeek 25 Onboarding
Use this skill to onboard a new 第 25 题 into the Strategy Planner solver path:
canonical ProblemIR
-> StrategyPlanner(deepseek or recorded)
-> StepIntent
-> RecipeTrialExecutor
-> RuntimeOrchestrator
-> ResultBuilder
The work is test-driven: create the DeepSeek integration test first, then add only reusable data/code needed to make the test pass.
Input Modes
Exactly one input mode should be used.
Mode A: Problem Image
Use when the user provides an image or screenshot of the problem.
- Extract the full original text, source, question number, objects, conditions, scopes, subquestions, and visible answer information.
- If the image does not include standard answers, solve the problem independently and create expected answers from the verified solution.
- Build canonical solver artifacts from the extracted problem. Do not use image-derived text as runtime truth after canonical ProblemIR is created.
Mode B: Existing Problem In The Repository
Use when the user refers to an existing题库 problem.
- Search
internal/lesson-specs/*/01_problem.mdby problem id, source, district, exam name, or distinctive problem text. - Read the matching
01_problem.md. - Read
02_solution.mdandlesson-data.jsononly to confirm answers, scopes, and likely student solution strategy. - The solver fact source is still canonical ProblemIR, not lesson markdown or page JSON.
Required Artifacts
For <problem_id>, create or update:
internal/solver-fixtures/<problem_id>.jsonserver/tests/solver/expected/<problem_id>.expected.jsonserver/tests/solver/test_deepseek_strategy_planner_<case>.py- after DeepSeek succeeds:
internal/solver-fixtures/<problem_id>.executable-step-intents.json - after recorded succeeds:
internal/few-shots/<problem_id>.few-shot.json
The canonical ProblemIR is the only authored problem fact source. Its input object must contain only problem_id, pattern, problem_type, display, original_text, scopes, entities, facts, and question_goals.
display is generated during ProblemIR creation, not by LessonIR or VisualBuilder. It should contain student-page metadata such as:
summary: one-line problem card summary, e.g.第 25 题(2026 天津市和平区一模)二次函数综合:解析式、角度条件与 OM+BN 路径最值。page_title: full page title.breadcrumb_title: short breadcrumb title.
original_text remains the faithful problem text metadata and lines. Include source, number, optional score, and the exact original problem lines.
Do not hand-author runtime compatibility fields such as symbols, symbol_roles, constraints, data.function, data.entities.points, data.entities.items, data.relations, data.path_problem, questions[].conditions, or questions[].goals.target_path. RuntimeProjection derives those fields for ContextBuilder and ResultBuilder.
It must not include expected answers, raw DeepSeek output, method chains, runtime ContextPath values, derived solution facts, or auxiliary solution-only entities.
Capability Abstraction Principles
The solver grows by adding reusable mathematical capabilities, not by patching one problem at a time.
Method Level
A method describes what mathematical problem it can solve, not the procedural steps of one concrete solution. Write method summaries in this shape:
Given <semantic inputs>, derive <semantic output> under <applicability/preconditions>.
Good method specs say:
- what class of conclusion the method derives;
- which semantic inputs are required;
- which output types it produces;
- which preconditions make the method applicable;
- whether symbolic or parameterized expressions are supported;
- what the method intentionally does not solve.
Avoid current-case wording such as fixed problem ids, district names, subquestion ids, answer values, or point names that are not abstract roles. Prefer role language such as fixed point, moving point, reference angle, target point, known intercept, dynamic parameter, and path expression.
The method Python SPEC is the source of truth. Generated internal/method-specs/*.json files must be synchronized from code, not edited by hand.
Recipe Level
A recipe describes a standard executable solving action that may contain one or more methods. Use a recipe when the LLM needs a high-level menu item such as “turn an equal-length two-moving-point path into a single-moving-point distance”, while the code owns the internal method sequence and wiring.
Do not create recipes that merely restate one current problem’s helper point construction. If the recipe cannot be described without the current letters, it is not abstract enough.
Family Level
strategy_principles should describe the student-friendly mathematical strategy for the family. step_recipes should describe standard executable actions in that strategy. The family name should reflect the core structure, such as weighted path transformation or equal-length ray path reduction, not a single exam problem.
The LLM should prefer recipe_hint from the recipe catalog first, then method ids from the method catalog. It may leave the hint empty only when no catalog entry fits; the code may still resolve such steps if the capability match is unique.
Workflow
1. Establish The Problem Contract
- Choose a stable
problem_idmatching repository naming, such astj-2026-xiqing-yimo-25. - Choose
patternandproblem_typefrom the structured problem, not from LLM guesswork during solving. - Fill
display.summary/page_title/breadcrumb_titlefrom the problem source, number, and key visible goals. Do not leave page summary generation to LessonIR or VisualBuilder. - Create expected answers separately under
server/tests/solver/expected/. - Make sure
QuestionGoal.handleusesanswer:<goal_id>and canonical entity/fact handles follow the repository naming scheme. - For Point answers, set
question_goals[].target_handleto the canonical point entity. For non-Point answers, the projection writes the result to the runtimeoutputscontainer. - If an answer produced in a subquestion is intentionally reusable by sibling steps, set
question_goals[].valid_scopeto the broader proof scope. Do not expose a runtimetarget_path. - Keep canonical ProblemIR text-faithful: every initial
EntityandFactmust be directly stated by the original problem text. Do not pre-fill coordinates, expressions, equalities, helper points, helper lines, or transformed path facts that are obtained by solving. - If the problem states a definitional object such as “C is the y-axis intercept” or “D is C translated right by 2”, encode that as an entity definition, not as a coordinate fact. The runtime/method layer must derive its value.
- If a solution introduces an auxiliary object such as an intersection/helper point, put it in StepIntent
creates[]and produce its computed fact inproduces[]; do not add it to the canonical ProblemIR.
2. Add The DeepSeek Test First
Create a test like the Xiqing pattern:
- Default tests must not call the network.
- The real DeepSeek test is opt-in with
RUN_LLM_INTEGRATION=1,RUN_DEEPSEEK_STRATEGY_PLANNER=1, and a case-specific flag. - At the start of the real test, clear
internal/solver-runs/strategy-planner-deepseek-<case>/. - The real test must call
solve_problem(problem, runtime_config=SolverRuntimeConfig.from_sources(planner_mode="strategy", llm_provider="deepseek", ...)). - Success means
result.status == "ok"andresult.answers == expected.
3. Run And Classify Failures
Use the debug artifacts to classify each failure before changing code.
Inspect the whole attempt, not only the first blocking exception. When available, read the raw draft, effective draft, normalization report, candidate report, execution diagnostic, accepted prefix, and previous-attempt payload.
Compare attempts when a later attempt succeeds. The most useful bugs often appear as a small shape difference between the failed and successful drafts. If the failed draft is mathematically correct but differs by naming, missing alias, multi-output shape, omitted state fact, or step ordering, prefer a code-side fix and add a fixed regression fixture.
Prefer this classification order:
- ProblemIR gap: missing entity, fact, scope, constraint, relation, or answer goal.
- FamilySpec gap: wrong family, missing
strategy_principles, missingmethod_ids, missing recipe, or missing binding rule. - Method gap: no reusable method can compute the needed fact/entity/answer.
- Recipe gap: multiple methods form a standard reusable solving action.
- Binding gap: method exists but semantic handles cannot bind to input slots.
- Normalizer gap: LLM output has a structurally safe, deterministic rewrite.
- Prompt/few-shot gap: LLM repeatedly chooses an invalid strategy or non-executable step granularity.
Do not patch runtime logic by matching a specific problem_id, exam title, fixed point name, or subquestion id.
Use these decision rules:
- Add or extend a method when a reusable, checkable mathematical transformation/calculation is missing.
- Add or extend a recipe when several methods form a reusable solving action and the LLM needs a high-level hint.
- Add a binding selector when the method already exists but canonical handles cannot map reliably to method input slots.
- Add a normalizer only for deterministic structural drift, such as safe alias correction, scope widening to a visible parent, output-type alias, duplicate exact
creates, or harmless utility fact rewrite. - Tune prompt/few-shot when capabilities already exist but the LLM repeatedly chooses the wrong family strategy or wrong executable granularity.
Treat the first blocker as the execution boundary, not as the whole diagnosis. Use preflight warnings and skipped-step review to surface downstream issues of the same kind, but do not ask the LLM to rewrite accepted prefix steps unless the prefix itself is wrong.
The repair loop is stateless chat-wise. If a round fails, preserve rich repair context: previous raw/effective StepIntent draft, accepted prefix, applied fills, blockers, skipped steps, and repair instructions. A validation-only failure must not erase a previous rich execution diagnostic.
Code-Side Absorption Rules
Absorb deterministic, structure-preserving deviations in code:
- If LLM reads an entity handle but omits the already-computed state fact, use entity-state resolution to bind the unique visible state. This applies to points, functions, symbols, segments, and other entity classes when the runtime type is known.
- If LLM uses an enumerated alias or parent-scope handle variant, canonicalize it through explicit alias rules only. Do not use fuzzy spelling correction.
- If a method returns multiple outputs, map produced handles to method output keys by structured signals first:
output_type, answer value type, fact type, and handle semantic name. Never rely on output order or natural-language description as the primary mapping. - If LLM emits a recipe’s internal method sequence, fold it back to the public recipe only when the recipe declares or clearly owns that sequence and the rewrite preserves all produced handles needed downstream.
- If a method has verified companion outputs, register them as readable runtime aliases even when the LLM does not explicitly produce every companion fact.
- If an existing method can safely prepare a missing prerequisite object, use declarative prep rules and expose only semantic StepIntent facts to the LLM, not runtime paths.
Do not absorb true mathematical gaps. Missing construction, missing relation, wrong family strategy, or an unsupported transformation should become repair feedback or a new reusable method/recipe.
Planner Insight And Repair Feedback
Some methods reveal roles that the LLM should not guess before execution, such as the moving point after path reduction, selected candidates, or endpoints from a straightening recipe. Expose these as planner-visible insights in previous_attempts, not as expected answers.
Good planner insights:
- use canonical handles and output types only;
- say what was learned, such as
moving_point,fixed_points,transformed_path, or recommended next capability; - never include
ContextPath, runtime invocation details, traceback, or expected answer values.
Repair guidance belongs near the capability that understands the failure:
- method-owned repair hints live in method Python
SPEC.repair_hints; - recipe-owned hints live with recipe execution metadata;
- binding-selector hints live with the selector or binding rule;
RepairFeedbackBuilderonly collects, ranks, merges, and filters hints for LLM consumption.
Do not hard-code a family-specific repair instruction in the generic builder or compiler. If a repair message names a method or recipe, it should be scoped by capability id, recipe id, or binding selector.
4. Add Reusable Capability
When adding or extending a method:
- Put implementation and
SPECin the method Python file. - Treat Python
SPECas the source of truth; generatedinternal/method-specs/*.jsonis derived. - Write the method summary as a capability statement, not an operation trace for one题.
- Include applicability, preconditions, input/output semantics, unsupported cases, and symbolic/parameterized support.
- If the method is useful for student explanation, describe the reusable mathematical idea, not a current-case derivation. Detailed derivation belongs to the explanation layer.
- If the method returns multiple useful values, declare all outputs in
SPEC.outputs. Compiler alias registration must map those outputs structurally, especially for multi-output methods such as geometry transforms. - If the method can repair a common misuse, add
repair_hintsto the methodSPECinstead of putting the message in generic runtime code. - Run the method spec generator:
cd server && uv run python -m shuxueshuo_server.solver.runtime.methods.generate_specs
- Add method unit tests, including old-case regression if extending an existing method.
When adding recipe or binding capability:
- Add recipe and method binding rules to the relevant
SolverFamilySpec. - Keep recipe generic: no problem id, no fixed point names, no answer values, and no hard-coded current-question path.
- Describe what the recipe solves at the executable step level, for example “reduce a two-moving-point path by an equal-length ray construction,” not “construct point F in 和平”.
- Let recipe execution own internal helper entities and method wiring when they are part of the standard action.
- Add unit tests for recipe selection, selector existence, binding behavior, and any normalizer rewrite.
- For recipe internals, keep a clear public contract. If the LLM should call the recipe, do not require it to know every internal method output; the compiler should wire and register companion outputs.
- If later planning depends on an internal result, expose a planner insight rather than expanding the recipe into a problem-specific mega method.
When changing prompt/few-shot:
- Keep StepIntent at method/recipe executable granularity, not student-facing explanation granularity.
- Do not put expected answers into prompt or payload.
- Prefer family strategy principles and verified few-shot examples over single-case prompt hacks.
- In tests, do not use the current problem as few-shot. Use a verified different example or a family mock fallback.
- A few-shot example teaches executable structure; webpage explanation steps may be coarser and are produced later by
ExplanationBuilder.
5. Solidify The Passing Case
After the real DeepSeek test passes:
- Convert the successful parsed StepIntent into
internal/solver-fixtures/<problem_id>.executable-step-intents.json. - Ensure every executable step has a non-null
recipe_hintthat matches a recipe id or method id unless the code can uniquely resolve it by design. - Add recorded Strategy E2E and assert it does not call deterministic planner.
- Generate the few-shot projection for the problem.
- Run focused tests and full solver regression.
6. Handoff To Lesson Page Generation
After the solver/runtime path is stable, use solver-to-lesson-page-onboarding for the student-facing page pipeline:
successful runtime artifacts
-> ExplanationSnapshot
-> LessonIR
-> VisualStepIR
-> interactions / animations
-> generated three JSON artifacts
-> compiled HTML
Keep webpage rendering requirements out of solver onboarding unless the page failure reveals a missing reusable runtime fact, method, recipe, binding, or planner insight. The solver skill answers “can the system solve the problem?”; the lesson-page skill answers “can the system teach, draw, animate, and compile the solved problem into a page?”
Commands
Recorded CLI smoke:
cd server && uv run python -m shuxueshuo_server.solver.solve_problem \
--fixture ../internal/solver-fixtures/<problem_id>.json \
--planner strategy --llm-provider recorded
Real DeepSeek:
cd server && RUN_LLM_INTEGRATION=1 RUN_DEEPSEEK_STRATEGY_PLANNER=1 \
RUN_DEEPSEEK_<CASE>_STRATEGY_PLANNER=1 \
uv run pytest tests/solver/test_deepseek_strategy_planner_<case>.py -q -s
Focused regression:
cd server && uv run pytest tests/solver/test_strategy_planner_phase1.py \
tests/solver/test_strategy_planner_production.py \
tests/solver/test_deepseek_strategy_planner_<case>.py -q
Full regression:
cd server && uv run pytest tests/solver -q
git diff --check
Guardrails
- Do not keep both a solver fixture and a separate LLM ProblemIR as independent truth sources.
- Do not store expected answers inside canonical ProblemIR.
- Do not store derived conclusions in canonical ProblemIR, including computed coordinates for definitional points, simplified equations, final parameter values, or auxiliary construction points.
- Do not use raw DeepSeek responses as permanent golden fixtures.
- Do not add deterministic planners for new cases.
- Do not let Strategy Planner output runtime paths, method input bindings, or
ctx_Nids. - If a change helps only the current problem and cannot be described as reusable family/method/recipe behavior, stop and redesign the abstraction.