Quickstart¶
Use this workflow to prepare data and launch a quick GRPO run with Qwen2.5-3B-Instruct.
1. Download and preprocess LongTVQA data¶
This step downloads required question/subtitle files and extracted frames.
bash scripts/download_and_prepare_longtvqa.sh
2. Build offline grounding cache (recommended)¶
This step performs initial clip localization and writes a cache for later training use.
python src/dataset/build_grounding_cache.py \
--dataset tvqa_plus \
--questions-path /path/to/train.json \
--subs-path /path/to/all_episodes_subtitles_by_clips.json \
--grounding-model "grok-4-fast-reasoning" \
--grounding-base-url "https://api2.aigcbest.top/v1" \
--output-dir /path/to/cache_dir \
--threads 8
3. Start 3B quickstart training¶
This step launches the GRPO quickstart training script.
bash scripts/quickstart_qwen_2_5_3B_grpo.sh
Reference Metrics (for quickstart runs)¶
The following figures are provided as reference from successful runs. They are not strict convergence targets, but useful sanity checks:
actor_kl_loss: should generally stay bounded and avoid long-term divergence spikes.critic_rewards_mean: should show a stable upward trend (with normal short-term fluctuations).
actor_kl_loss (reference)¶

critic_rewards_mean (reference)¶
