Jiayi-Pan / TinyZero
Minimal reproduction of DeepSeek R1-Zero
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing Jiayi-Pan/TinyZero in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Summary (README)
PreviewTinyZero

TinyZero is a reproduction of DeepSeek R1 Zero in countdown and multiplication tasks. We built upon veRL.
Through RL, the 3B base LM develops self-verification and search abilities all on its own
You can experience the Ahah moment yourself for < $30
Twitter thread: https://x.com/jiayi_pirate/status/1882839370505621655
Full experiment log: https://wandb.ai/jiayipan/TinyZero
📢: We release Apative Parallel Reasoning, where we explore a new dimension in scaling reasoining models
Installation
conda create -n zero python=3.9
# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
pip3 install ray
# verl
pip install -e .
# flash attention 2
pip3 install flash-attn --no-build-isolation
# quality of life
pip install wandb IPython matplotlib
Countdown task
Data Preparation
conda activate zero
python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}
Run Training
conda activate zero
For the following code, if you see Out-of-vram, try add critic.model.enable_gradient_checkpointing=True to the script, and checkout the discussion here
Single GPU
Works for model <= 1.5B. For Qwen2.5-0.5B base, we know it fails to learn reasoning.
export N_GPUS=1
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=1
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
export VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_tiny_zero.sh
3B+ model In this case, the base model is able to develop sophisticated reasoning skills.
export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b
export VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_tiny_zero.sh
Instruct Ablation
We experiment with QWen-2.5-3B Instruct too. Data Preparation To follow chat template, we need to reprocess the data:
conda activate zero
python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}
Training
export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
export VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_tiny_zero.sh
Acknowledge
Citation
@misc{tinyzero,
author = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan and Hao Peng and Alane Suhr},
title = {TinyZero},
howpublished = {https://github.com/Jiayi-Pan/TinyZero},
note = {Accessed: 2025-01-24},
year = {2025}
}