125 lines
3.4 KiB
Markdown
125 lines
3.4 KiB
Markdown
---
|
|
title: "Jiayi-Pan/TinyZero: Clean, accessible reproduction of DeepSeek R1-Zero"
|
|
source: "https://github.com/Jiayi-Pan/TinyZero"
|
|
author:
|
|
- "[[GitHub]]"
|
|
published:
|
|
created: 2025-01-30
|
|
description: "Clean, accessible reproduction of DeepSeek R1-Zero - Jiayi-Pan/TinyZero"
|
|
tags:
|
|
- "clippings"
|
|
---
|
|
## TinyZero
|
|
|
|
[](https://github.com/Jiayi-Pan/TinyZero/blob/main/cover.png)
|
|
|
|
TinyZero is a reproduction of [DeepSeek R1 Zero](https://github.com/deepseek-ai/DeepSeek-R1) in countdown and multiplication tasks. We built upon [veRL](https://github.com/volcengine/verl).
|
|
|
|
Through RL, the 3B base LM develops self-verification and search abilities all on its own
|
|
|
|
You can experience the Ahah moment yourself for < $30
|
|
|
|
Twitter thread: [https://x.com/jiayi\_pirate/status/1882839370505621655](https://x.com/jiayi_pirate/status/1882839370505621655)
|
|
|
|
Full experiment log: [https://wandb.ai/jiayipan/TinyZero](https://wandb.ai/jiayipan/TinyZero)
|
|
|
|
Paper's on it's way!
|
|
|
|
## Installation
|
|
|
|
```
|
|
conda create -n zero python=3.9
|
|
# install torch [or you can skip this step and let vllm to install the correct version for you]
|
|
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
|
|
# install vllm
|
|
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
|
|
pip3 install ray
|
|
|
|
# verl
|
|
pip install -e .
|
|
|
|
# flash attention 2
|
|
pip3 install flash-attn --no-build-isolation
|
|
# quality of life
|
|
pip install wandb IPython matplotlib
|
|
```
|
|
|
|
## Countdown task
|
|
|
|
**Data Preparation**
|
|
|
|
```
|
|
conda activate zero
|
|
python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}
|
|
```
|
|
|
|
### Run Training
|
|
|
|
For the following code, if you see Out-of-vram, try add `critic.model.enable_gradient_checkpointing=True` to the script
|
|
|
|
**Single GPU**
|
|
|
|
Works for model <= 1.5B. For Qwen2.5-0.5B base, we know it fails to learn reasoning.
|
|
|
|
```
|
|
export N_GPUS=1
|
|
export BASE_MODEL={path_to_your_model}
|
|
export DATA_DIR={path_to_your_dataset}
|
|
export ROLLOUT_TP_SIZE=1
|
|
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
|
|
export VLLM_ATTENTION_BACKEND=XFORMERS
|
|
|
|
bash ./scripts/train_tiny_zero.sh
|
|
```
|
|
|
|
**3B+ model** In this case, the base model is able to develop sophisticated reasoning skills.
|
|
|
|
```
|
|
export N_GPUS=2
|
|
export BASE_MODEL={path_to_your_model}
|
|
export DATA_DIR={path_to_your_dataset}
|
|
export ROLLOUT_TP_SIZE=2
|
|
export EXPERIMENT_NAME=countdown-qwen2.5-3b
|
|
export VLLM_ATTENTION_BACKEND=XFORMERS
|
|
|
|
bash ./scripts/train_tiny_zero.sh
|
|
```
|
|
|
|
### Instruct Ablation
|
|
|
|
We experiment with QWen-2.5-3B Instruct too. **Data Preparation** To follow chat template, we need to reprocess the data:
|
|
|
|
```
|
|
conda activate zero
|
|
python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}
|
|
```
|
|
|
|
**Training**
|
|
|
|
```
|
|
export N_GPUS=2
|
|
export BASE_MODEL={path_to_your_model}
|
|
export DATA_DIR={path_to_your_dataset}
|
|
export ROLLOUT_TP_SIZE=2
|
|
export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
|
|
export VLLM_ATTENTION_BACKEND=XFORMERS
|
|
|
|
bash ./scripts/train_tiny_zero.sh
|
|
```
|
|
|
|
## Acknowledge
|
|
|
|
- We run our experiments based on [veRL](https://github.com/volcengine/verl).
|
|
- We use Qwen2.5 series base model [Qwen2.5](https://github.com/QwenLM/Qwen2.5).
|
|
|
|
## Citation
|
|
|
|
```
|
|
@misc{tinyzero,
|
|
author = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan and Hao Peng and Alane Suhr},
|
|
title = {TinyZero},
|
|
howpublished = {https://github.com/Jiayi-Pan/TinyZero},
|
|
note = {Accessed: 2025-01-24},
|
|
year = {2025}
|
|
}
|
|
``` |