chore: vault backup from Joachims-PC at 2025-01-31T11:10

2025-01-31 11:10:41 +01:00 · 2025-01-31 11:10:41 +01:00 · 8ff572f543
commit 8ff572f543
parent 29872bd9c7
5 changed files with 199 additions and 0 deletions
--- a/Clippings/Jiayi-PanTinyZero
+++ b/Clippings/Jiayi-PanTinyZero
@ -0,0 +1,125 @@
+---
+title: "Jiayi-Pan/TinyZero: Clean, accessible reproduction of DeepSeek R1-Zero"
+source: "https://github.com/Jiayi-Pan/TinyZero"
+author:
+  - "[[GitHub]]"
+published:
+created: 2025-01-30
+description: "Clean, accessible reproduction of DeepSeek R1-Zero - Jiayi-Pan/TinyZero"
+tags:
+  - "clippings"
+---
+## TinyZero
+
+[![image](https://github.com/Jiayi-Pan/TinyZero/raw/main/cover.png)](https://github.com/Jiayi-Pan/TinyZero/blob/main/cover.png)
+
+TinyZero is a reproduction of [DeepSeek R1 Zero](https://github.com/deepseek-ai/DeepSeek-R1) in countdown and multiplication tasks. We built upon [veRL](https://github.com/volcengine/verl).
+
+Through RL, the 3B base LM develops self-verification and search abilities all on its own
+
+You can experience the Ahah moment yourself for < $30
+
+Twitter thread: [https://x.com/jiayi\_pirate/status/1882839370505621655](https://x.com/jiayi_pirate/status/1882839370505621655)
+
+Full experiment log: [https://wandb.ai/jiayipan/TinyZero](https://wandb.ai/jiayipan/TinyZero)
+
+Paper's on it's way!
+
+## Installation
+
+```
+conda create -n zero python=3.9
+# install torch [or you can skip this step and let vllm to install the correct version for you]
+pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
+# install vllm
+pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
+pip3 install ray
+
+# verl
+pip install -e .
+
+# flash attention 2
+pip3 install flash-attn --no-build-isolation
+# quality of life
+pip install wandb IPython matplotlib
+```
+
+## Countdown task
+
+**Data Preparation**
+
+```
+conda activate zero
+python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}
+```
+
+### Run Training
+
+For the following code, if you see Out-of-vram, try add `critic.model.enable_gradient_checkpointing=True` to the script
+
+**Single GPU**
+
+Works for model <= 1.5B. For Qwen2.5-0.5B base, we know it fails to learn reasoning.
+
+```
+export N_GPUS=1
+export BASE_MODEL={path_to_your_model}
+export DATA_DIR={path_to_your_dataset}
+export ROLLOUT_TP_SIZE=1
+export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
+export VLLM_ATTENTION_BACKEND=XFORMERS
+
+bash ./scripts/train_tiny_zero.sh
+```
+
+**3B+ model** In this case, the base model is able to develop sophisticated reasoning skills.
+
+```
+export N_GPUS=2
+export BASE_MODEL={path_to_your_model}
+export DATA_DIR={path_to_your_dataset}
+export ROLLOUT_TP_SIZE=2
+export EXPERIMENT_NAME=countdown-qwen2.5-3b
+export VLLM_ATTENTION_BACKEND=XFORMERS
+
+bash ./scripts/train_tiny_zero.sh
+```
+
+### Instruct Ablation
+
+We experiment with QWen-2.5-3B Instruct too. **Data Preparation** To follow chat template, we need to reprocess the data:
+
+```
+conda activate zero
+python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}
+```
+
+**Training**
+
+```
+export N_GPUS=2
+export BASE_MODEL={path_to_your_model}
+export DATA_DIR={path_to_your_dataset}
+export ROLLOUT_TP_SIZE=2
+export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
+export VLLM_ATTENTION_BACKEND=XFORMERS
+
+bash ./scripts/train_tiny_zero.sh
+```
+
+## Acknowledge
+
+- We run our experiments based on [veRL](https://github.com/volcengine/verl).
+- We use Qwen2.5 series base model [Qwen2.5](https://github.com/QwenLM/Qwen2.5).
+
+## Citation
+
+```
+@misc{tinyzero,
+author       = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan and Hao Peng and Alane Suhr},
+title        = {TinyZero},
+howpublished = {https://github.com/Jiayi-Pan/TinyZero},
+note         = {Accessed: 2025-01-24},
+year         = {2025}
+}
+```