tool-integrated-reasoning-recipe - SKILL.md Agent Skill

name: tool-integrated-reasoning-recipe description: Full-pipeline recipe for injecting tool-use into thinking models without sacrificing text-only reasoning. Two-stage SFT+RL approach. Based on arXiv 2605.06326. category: llm-reasoning

Tool-Integrated Reasoning (TIR) Recipe

Overview

A comprehensive recipe for injecting natural tool-use behavior into strong thinking models without sacrificing text-only reasoning. Achieved 96.7% and 99.2% on AIME 2025 for Qwen3 4B/30B.

Core Pipeline

Stage 1: TIR SFT

Learnable teacher trajectories: Prioritize problems naturally suited for tool-augmented solutions
Control tool-use proportion: Balance tool-use vs text-only to mitigate catastrophic forgetting
Optimize for pass@k and response length: Not training loss — preserves RL headroom

Stage 2: RL with Verifiable Rewards (RLVR)

Suitable SFT initialization: Start from well-tuned SFT
Explicit mode collapse safeguards: Prevent over-reliance on tools
Simple RLVR setup: Effective when built on good SFT foundation

Key Findings

Tool-enabled evaluation can degrade reasoning even with minimal tool calls
TIR SFT effectiveness hinges on teacher trajectory learnability
Tool-use proportion must be controlled to avoid forgetting
Pass@k and response length optimization > training loss optimization

Implementation Steps

Curate problems suited for tool augmentation
Generate teacher trajectories with natural tool-use
Mix tool-use and text-only trajectories (tune ratio)
SFT optimizing for pass@k and response length
RLVR with mode collapse safeguards
Evaluate on both tool-required and text-only benchmarks

Applicable Use Cases

Adding tool use to reasoning models
Preventing catastrophic forgetting
Multi-stage training pipelines
Achieving state-of-the-art math reasoning

Triggers / Keywords

tool-integrated reasoning, TIR, SFT+RL pipeline, thinking models, AIME, mode collapse, teacher trajectories