# Dream > 📄 Paper: [Dream 7B: Diffusion Large Language Models](https://arxiv.org/abs/2508.15487) | 💻 Code: [github.com/DreamLM/Dream](https://github.com/DreamLM/Dream) Resources and examples for training (finetuning & pretraining) and evaluating diffusion language models **Dream**. ## Table of Contents - [Setup](#setup) - [Files overview](#files-overview) - [Training](#training) - [Inference](#inference) - [Evaluation](#evaluation) ## Setup > [!IMPORTANT] > **Slurm users:** Update `scripts/train.slurm.sh` and `mkdir logps`: see [(optional) Slurm setup](/README.md/#optional-slurm-setup) for details. > ## Files overview ``` # tools relevant with Dream dllm/pipelines/dream ├── __init__.py # Package initialization ├── models/ │ ├── configuration_dream.py # Dream model configuration │ ├── generation_utils.py # Diffusion-based generation logic │ ├── modeling_dream.py # Core Dream model architecture │ └── tokenization_dream.py # Tokenizer implementation for Dream ├── generator.py # Inference logic ├── trainer.py # Training logic (pretraining and SFT) └── utils.py # Auxiliary utilities and helper functions # example entry points for training / inference / evaluation examples/dream ├── chat.py # Interactive inference example ├── eval.sh # Automatic evaluation script ├── generate.py # Inference example ├── pt.py # Pretraining example ├── README.md # Documentation (you are here) └── sft.py # Supervised finetuning example ``` ## Training ### Finetuning For example, to SFT [`Dream-v0-Base-7B`](https://huggingface.co/Dream-org/Dream-v0-Base-7B) for instruction following on 8 GPUs, run: ```shell accelerate launch \ --config_file scripts/accelerate_configs/fsdp.yaml \ examples/dream/sft.py \ --model_name_or_path "Dream-org/Dream-v0-Base-7B" \ --dataset_args "allenai/tulu-3-sft-mixture" \ --output_dir "models/Dream-7B-SFT/tulu-3-sft-mixture" \ --max_length 1024 \ --num_train_epochs 4 \ --learning_rate 2e-5 ``` If you are using slurm and want to train across, for example, 2 nodes (16 GPUs total), run: ```shell sbatch --nodes=2 --gres=gpu:8 scripts/train.slurm.sh \ --accelerate_config "fsdp" \ --script_path "examples/dream/sft.py" \ --model_name_or_path "Dream-org/Dream-v0-Base-7B" \ --dataset_args "allenai/tulu-3-sft-mixture" \ --output_dir "models/Dream-7B-SFT/tulu-3-sft-mixture" \ --max_length 1024 \ --num_train_epochs 4 \ --learning_rate 2e-5 ``` #### Reproducing [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) We tried our best to reproduce [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) by finetuning [`Dream-v0-Base-7B`](https://huggingface.co/Dream-org/Dream-v0-Base-7B) using our training pipeline on the public instruction-following dataset [`allenai/tulu-3-sft-mixture`](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture): ```shell # preprocessing SFT data (optional, but can avoid redundant preprocessing for multi-node training) PYTHONPATH=. python dllm/tools/preprocess_sft_dataset.py \ --model_name_or_path "Dream-org/Dream-v0-Base-7B" \ --sft_map_fn_path "examples.dream.sft.sft_map_fn" \ --dataset_args "allenai/tulu-3-sft-mixture" \ --output_dir "data/sft/dream/tulu-3-sft-mixture" \ --num_proc 64 # train on 24*8=192 A100s with FSDP, take about 8 hours sbatch --nodes=24 --gres=gpu:8 scripts/train.slurm.sh \ --accelerate_config "fsdp" \ --script_path "examples/dream/sft.py" \ --model_name_or_path "Dream-org/Dream-v0-Base-7B" \ --dataset_args "data/sft/dream/tulu-3-sft-mixture" \ --load_preprocessed_data True \ --output_dir "models/Dream-7B-SFT-tulu3-fsdp-bs4-len2048-ep5-lr1e-5" \ --max_length 2048 \ --truncation "right" \ --group_by_length True \ --num_train_epochs 5 \ --learning_rate 1e-5 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 2 \ --per_device_eval_batch_size 2 \ --eval_on_start False \ --eval_steps 0.1 \ --save_steps 0.05 ``` ### Pretraining Pretrain on [`mlfoundations/dclm-baseline-1.0`](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) from scratch using 192 GPUs (24x8) and FSDP: ```shell sbatch --nodes=24 --gres=gpu:8 scripts/train.slurm.sh \ --accelerate_config "fsdp" \ --script_path "examples/dream/pt.py" \ --model_name_or_path "Dream-org/Dream-v0-Base-7B" \ --dataset_args "mlfoundations/dclm-baseline-1.0" \ --output_dir "models/Dream-7B-PT/dclm-baseline-1.0" \ --max_length 1024 \ --max_steps 2000 \ --learning_rate 3e-4 ``` ## Inference We support batch inference for standard generation and infilling: ```shell python examples/dream/generate.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B" ``` We also support interactive multi-turn dialogue with visualization: ```shell python examples/dream/chat.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B" ``` ## Evaluation > Read [(optional) Evaluation setup](/README.md/#optional-evaluation-setup) before running evaluation. For example, to evaluate [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) on [`MMLU-Pro`](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro) using 4 GPUs, run: ```shell # Use model_args to adjust the generation arguments for evalution. accelerate launch --num_processes 4 \ dllm/pipelines/dream/eval.py \ --tasks "mmlu_pro" \ --model "dream" \ --apply_chat_template \ --num_fewshot 0 \ --model_args "pretrained=Dream-org/Dream-v0-Instruct-7B,mc_num=1,max_new_tokens=128,max_length=128,steps=128,temperature=0.1,top_p=0.9,add_bos_token=true,escape_until=true" ``` To automatically evaluate [`Dream-v0-Base-7B`](https://huggingface.co/Dream-org/Dream-v0-Base-7B) and [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) on all benchmarks, run: ```shell bash examples/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Instruct-7B" --instruct True bash examples/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Base-7B" --instruct False ``` ### Evaluation results > Results (evaluated) are evaluated using our framework, while results (reported) come from the original paper. All evaluation settings follow the configurations in the [Dream](https://github.com/DreamLM/Dream) repository, with minor adjustments. Placeholder entries (“–”) indicate results not yet evaluated; full results will be released soon. |                 | MMLU | BBH | ARC‑C | ARC‑E | Hellaswag | WinoGrande | PIQA | GSM8K | Math | GPQA | HumanEval | MBPP | RACE | Countdown | Sudoku | Trip planning | |:----------------|:-------:|:-------:|:-----:|:-----:|:-----------:|:------------:|:----:|:-----:|:----:|:----:|:-----------:|:----:|:------:|:-----------:|:----:|:-----------:| | [`Dream-v0-Base-7B`](https://huggingface.co/Dream-org/Dream-v0-Base-7B) (reported) | 69.5 | 57.9 | 59.9 | 83.9 | 73.3 | 74.8 | 75.8 | 77.2 | 39.6 | 36.6 | 57.9 | 56.2 | 44.7 | 16.0 | 81.0 | 17.8 | | [`Dream-v0-Base-7B`](https://huggingface.co/Dream-org/Dream-v0-Base-7B) (evaluated) | – | – | 59.7 | 83.3 | 73.1 | 72.9 | 72.0 | 69.6 | – | 35.5 | 45.8 | – | 43.0 | – | – | – |

Table 1. Evaluation results of Dream-8B-Base .

| | MMLU | MMLU-Pro | GSM8K | Math | GPQA | HumanEval | MBPP | IFEval | |:----------------|:----:|:---------:|:-----:|:----:|:----:|:-----------:|:----:|:----:| | [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B)(reported) | 67.0 | 43.3 | 81.0 | 39.2 | 33.0 | 55.5 | 58.8 | 62.5 | | [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B)(evaluated) | – | 43.0 | 82.6 | 39.9 | 32.4 | 59.1 | – | 62.3 |

Table 2. Evaluation results of Dream-8B-Instruct .