ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

A PyTorch implementation of ReTool from the paper "ReTool: Reinforcement Learning for Strategic Tool Use in LLMs" by Feng et al. (2025).

ReTool enhances long-form reasoning by integrating code interpreter execution into the RL training loop, enabling models to learn when and how to invoke computational tools for mathematical problem solving.

ReTool Rollout Process

Figure 2: Comparison of standard text-based RL vs ReTool's code-integrated training process

🚀 Key Features

📊 Performance

AIME Results

Figure 1: ReTool achieves 67% accuracy on AIME 2024, significantly outperforming text-based RL (40%)

🛠️ Installation

git clone https://github.com/yourusername/retool-implementation.git
cd  retool-implementation/scr
pip install -r requirements.txt

🚧 Current Status

This is a research implementation based on the ReTool paper. The core components are implemented but not yet fully tested.

What's Implemented ✅

What Needs Testing/Integration 🔧

For Researchers & Developers

This implementation serves as a foundation for:

📊 Dataset Format

Your dataset should contain dictionaries with:

{
    "prompt": "Solve this math problem: ...",
    "answer": "42"  # Ground truth for reward computation
}

🔍 How It Works

  1. Multi-turn Generation: Model generates reasoning step-by-step
  2. Code Detection: When </code> is generated, extract and execute code
  3. Tool Integration: Append <interpreter>result</interpreter> to context
  4. Continued Reasoning: Model continues with tool feedback
  5. Reward Computation: Binary reward based on final answer correctness
  6. RL Training: PPO updates exclude interpreter tokens from loss

⚙️ Key Components

ReToolTrainer Class

Configuration Options

trainer = ReToolTrainer(
    # ... model and data ...
    max_turns=10,              # Maximum reasoning turns
    temperature=0.7,           # Generation temperature
    max_completion_length=1024, # Max tokens per turn
    mask_truncated_completions=True,  # Handle incomplete sequences
)

💡 Usage Example (Conceptual)

from retool_trainer import ReToolTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments

# This shows the intended API - full testing in progress
trainer = ReToolTrainer(
    model=AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-32B-Instruct"),
    processing_class=AutoTokenizer.from_pretrained("Qwen/Qwen2.5-32B-Instruct"),
    args=TrainingArguments(...),
    train_dataset=your_math_dataset,
    max_turns=10,
)

# trainer.train()  # Full integration testing in progress

📈 Results From Paper

🚧 Limitations & TODOs

📚 Citation

@article{feng2025retool,
  title={ReTool: Reinforcement Learning for Strategic Tool Use in LLMs},
  author={Feng, Jiazhan and Huang, Shijue and Qu, Xingwei and Zhang, Ge and Qin, Yujia and Zhong, Baoquan and Jiang, Chengquan and Chi, Jinxin and Zhong, Wanjun},
  journal={arXiv preprint arXiv:2504.11536},
  year={2025}
}

📄 License

MIT License - see LICENSE file for details.

🤝 Collaboration welcome: Looking for teammates with complementary skills:

🙏 Acknowledgments


Built with ❤️ for advancing AI reasoning capabilities