Changelog¶
All notable changes to this project will be documented in this file.
[Unreleased] - 2025-11-06¶
Added¶
- Quick-start training script using a 3B parameter model for rapid iteration.
Changed¶
- Enhanced reward function with multi-turn interaction statistics; new metric multi-turn call is now logged in Weights & Biases (wandb).
Removed¶
- Redundant
printstatements to streamline console output.