A PyTorch implementation of ReTool from the paper "ReTool: Reinforcement Learning for Strategic Tool Use in LLMs" by Feng et al. (2025).
ReTool enhances long-form reasoning by integrating code interpreter execution into the RL training loop, enabling models to learn when and how to invoke computational tools for mathematical problem solving.
Figure 2: Comparison of standard text-based RL vs ReTool's code-integrated training process
Figure 1: ReTool achieves 67% accuracy on AIME 2024, significantly outperforming text-based RL (40%)
git clone https://github.com/yourusername/retool-implementation.git
cd retool-implementation/scr
pip install -r requirements.txt
This is a research implementation based on the ReTool paper. The core components are implemented but not yet fully tested.
This implementation serves as a foundation for:
Your dataset should contain dictionaries with:
{
"prompt": "Solve this math problem: ...",
"answer": "42" # Ground truth for reward computation
}
</code>
is generated, extract and execute code<interpreter>result</interpreter>
to context_retool_generate_with_interpreter()
: Multi-turn generation with tool execution_create_interpreter_mask()
: Creates masks for excluding tool outputs_compute_loss()
: Modified PPO loss with interpreter masking_compute_rewards_and_advantages()
: Binary reward computationtrainer = ReToolTrainer(
# ... model and data ...
max_turns=10, # Maximum reasoning turns
temperature=0.7, # Generation temperature
max_completion_length=1024, # Max tokens per turn
mask_truncated_completions=True, # Handle incomplete sequences
)
from retool_trainer import ReToolTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
# This shows the intended API - full testing in progress
trainer = ReToolTrainer(
model=AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-32B-Instruct"),
processing_class=AutoTokenizer.from_pretrained("Qwen/Qwen2.5-32B-Instruct"),
args=TrainingArguments(...),
train_dataset=your_math_dataset,
max_turns=10,
)
# trainer.train() # Full integration testing in progress
@article{feng2025retool,
title={ReTool: Reinforcement Learning for Strategic Tool Use in LLMs},
author={Feng, Jiazhan and Huang, Shijue and Qu, Xingwei and Zhang, Ge and Qin, Yujia and Zhong, Baoquan and Jiang, Chengquan and Chi, Jinxin and Zhong, Wanjun},
journal={arXiv preprint arXiv:2504.11536},
year={2025}
}
MIT License - see LICENSE file for details.
🤝 Collaboration welcome: Looking for teammates with complementary skills: