Skip to content

Changelog

All notable changes to this project will be documented in this file.

[Unreleased] - 2025-11-06

Added

  • Quick-start training script using a 3B parameter model for rapid iteration.

Changed

  • Enhanced reward function with multi-turn interaction statistics; new metric multi-turn call is now logged in Weights & Biases (wandb).

Removed

  • Redundant print statements to streamline console output.