name: external-gitcode-ascend-msverl-daily-regression-triage description: Triage a daily msverl regression run by reading the baseline comparison log, stopping on success, extracting the most relevant training failure evidence from the daily training log when needed, collecting recent commits from verl main and MindSpeed master, and ranking the most likely culprit commits with concise fix-direction guidance. original-name: msverl-daily-regression-triage synced-from: https://gitcode.com/Ascend/agent-skills synced-date: '2026-05-26' synced-commit: 1f7666e7768a0ceb21bb1d40ce4b5179fcb6f1d6 license: UNKNOWN
MSVerl Daily Regression Triage
Use this skill when a fixed daily verl + MindSpeed training job has run and Codex needs to decide whether the result is healthy, whether there is a training failure or an accuracy regression, and which recent commit is the most likely cause.
Defaults
- Baseline comparison log:
/home/st_daily_verl/msverl.log - Training log pattern:
/home/st_daily_verl/logs/msverl_YYYYMMDD.log verlrepo:https://github.com/verl-project/verl.gitonmainMindSpeedrepo:https://gitcode.com/Ascend/MindSpeed.gitonmaster- Cache root for temporary clones:
/tmp/msverl-skill-cache - Time window: from local previous day
00:00:00to the task execution time
Hard Stop Rules
- Read the comparison log first.
- If it contains
mean abs diff:and the parsed value is exactly0, stop and report success. - If it contains
mean abs diff:and the value is non-zero, classify asaccuracy_regression. - If it contains
error, please check log, classify astrain_error. - If the comparison log is ambiguous, report
unknownand explain what evidence is missing before doing expensive work.
Workflow
- Run parse_result_log.py on the comparison log.
- Stop immediately on
pass. - For
train_error, run extract_failure_tail.py against the daily training log and keep only the final high-signal error block. - For
accuracy_regression, use the parsed reward lists andmean abs diffas the primary evidence. - Sync lightweight local clones with sync_repos.py.
- Collect recent commits with list_recent_commits.py for both repositories inside the default time window unless the user gives a different one.
- Rank suspects with rank_candidate_commits.py.
- Inspect diffs only for the top few commits when titles and touched files are not enough to explain a plausible fix direction.
Cost Controls
- Never load the whole training log unless the tail-based extractor fails twice.
- Start with the log tail only; prefer the last traceback or last
ERRORblock. - Rank commits using title and touched files before reading diffs.
- Limit deep diff reading to the top
3candidates per repository unless the evidence is still weak.
Expected Output
Return a compact report with:
status:pass,train_error,accuracy_regression, orunknowntime_windowevidence_summarycandidate_repocandidate_commitsconfidence:high,medium, orlowfix_direction
When evidence is weak, say so clearly instead of forcing a single-commit claim.
References
- Run triage_msverl_regression.py for an end-to-end local workflow.
- Use parse_result_log.py and extract_failure_tail.py separately when validating logs by hand.
- Use list_recent_commits.py when you need a raw recent-commit inventory without ranking.