# tools relevant with Dream
dllm/pipelines/dream
├── __init__.py                     # Package initialization
├── models/
│   ├── configuration_dream.py      # Dream model configuration
│   ├── generation_utils.py         # Diffusion-based generation logic
│   ├── modeling_dream.py           # Core Dream model architecture
│   └── tokenization_dream.py       # Tokenizer implementation for Dream
├── generator.py                    # Inference logic
├── trainer.py                      # Training logic (pretraining and SFT)
└── utils.py                        # Auxiliary utilities and helper functions

# example entry points for training / inference / evaluation
examples/dream
├── chat.py                         # Interactive inference example
├── eval.sh                         # Automatic evaluation script
├── generate.py                     # Inference example
├── pt.py                           # Pretraining example
├── README.md                       # Documentation (you are here)
└── sft.py                          # Supervised finetuning example

Training

Finetuning

For example, to SFT Dream-v0-Base-7B for instruction following on 8 GPUs, run:

accelerate launch \
    --config_file scripts/accelerate_configs/fsdp.yaml \
    examples/dream/sft.py \
    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
    --dataset_args "allenai/tulu-3-sft-mixture" \
    --output_dir "models/Dream-7B-SFT/tulu-3-sft-mixture" \
    --max_length 1024 \
    --num_train_epochs 4 \
    --learning_rate 2e-5

If you are using slurm and want to train across, for example, 2 nodes (16 GPUs total), run:

sbatch --nodes=2 --gres=gpu:8 scripts/train.slurm.sh \
    --accelerate_config "fsdp" \
    --script_path "examples/dream/sft.py" \
    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
    --dataset_args "allenai/tulu-3-sft-mixture" \
    --output_dir "models/Dream-7B-SFT/tulu-3-sft-mixture" \
    --max_length 1024 \
    --num_train_epochs 4 \
    --learning_rate 2e-5

Reproducing `Dream-v0-Instruct-7B`

We tried our best to reproduce Dream-v0-Instruct-7B by finetuning Dream-v0-Base-7B using our training pipeline on the public instruction-following dataset allenai/tulu-3-sft-mixture:

# preprocessing SFT data (optional, but can avoid redundant preprocessing for multi-node training)
PYTHONPATH=. python dllm/tools/preprocess_sft_dataset.py \
    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
    --sft_map_fn_path "examples.dream.sft.sft_map_fn" \
    --dataset_args "allenai/tulu-3-sft-mixture" \
    --output_dir "data/sft/dream/tulu-3-sft-mixture" \
    --num_proc 64

# train on 24*8=192 A100s with FSDP, take about 8 hours
sbatch --nodes=24 --gres=gpu:8 scripts/train.slurm.sh \
    --accelerate_config "fsdp" \
    --script_path "examples/dream/sft.py" \
    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
    --dataset_args "data/sft/dream/tulu-3-sft-mixture" \
    --load_preprocessed_data True \
    --output_dir "models/Dream-7B-SFT-tulu3-fsdp-bs4-len2048-ep5-lr1e-5" \
    --max_length 2048 \
    --truncation "right" \
    --group_by_length True \
    --num_train_epochs 5 \
    --learning_rate 1e-5 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 2 \
    --per_device_eval_batch_size 2 \
    --eval_on_start False \
    --eval_steps 0.1 \
    --save_steps 0.05

Pretraining

Pretrain on mlfoundations/dclm-baseline-1.0 from scratch using 192 GPUs (24x8) and FSDP:

sbatch --nodes=24 --gres=gpu:8 scripts/train.slurm.sh \
    --accelerate_config "fsdp" \
    --script_path "examples/dream/pt.py" \
    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
    --dataset_args "mlfoundations/dclm-baseline-1.0" \
    --output_dir "models/Dream-7B-PT/dclm-baseline-1.0" \
    --max_length 1024 \
    --max_steps 2000 \
    --learning_rate 3e-4

Inference

We support batch inference for standard generation and infilling:

python examples/dream/generate.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B"

We also support interactive multi-turn dialogue with visualization:

python examples/dream/chat.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B"

Evaluation

Read (optional) Evaluation setup before running evaluation.

For example, to evaluate Dream-v0-Instruct-7B on MMLU-Pro using 4 GPUs, run:

# Use model_args to adjust the generation arguments for evalution.
accelerate launch --num_processes 4 \
    dllm/pipelines/dream/eval.py \
    --tasks "mmlu_pro" \
    --model "dream" \
    --apply_chat_template \
    --num_fewshot 0 \
    --model_args "pretrained=Dream-org/Dream-v0-Instruct-7B,mc_num=1,max_new_tokens=128,max_length=128,steps=128,temperature=0.1,top_p=0.9,add_bos_token=true,escape_until=true"

To automatically evaluate Dream-v0-Base-7B and Dream-v0-Instruct-7B on all benchmarks, run:

bash examples/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Instruct-7B" --instruct True
bash examples/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Base-7B" --instruct False

Evaluation results

Results (evaluated) are evaluated using our framework, while results (reported) come from the original paper. All evaluation settings follow the configurations in the Dream repository, with minor adjustments. Placeholder entries (“–”) indicate results not yet evaluated; full results will be released soon.

	MMLU	BBH	ARC‑C	ARC‑E	Hellaswag	WinoGrande	PIQA	GSM8K	Math	GPQA	HumanEval	MBPP	RACE	Countdown	Sudoku	Trip planning
`Dream-v0-Base-7B` (reported)	69.5	57.9	59.9	83.9	73.3	74.8	75.8	77.2	39.6	36.6	57.9	56.2	44.7	16.0	81.0	17.8
`Dream-v0-Base-7B` (evaluated)	–	–	59.7	83.3	73.1	72.9	72.0	69.6	–	35.5	45.8	–	43.0	–	–	–

Table 1. Evaluation results of Dream-8B-Base .

	MMLU	MMLU-Pro	GSM8K	Math	GPQA	HumanEval	MBPP	IFEval
`Dream-v0-Instruct-7B`(reported)	67.0	43.3	81.0	39.2	33.0	55.5	58.8	62.5
`Dream-v0-Instruct-7B`(evaluated)	–	43.0	82.6	39.9	32.4	59.1	–	62.3

Table 2. Evaluation results of Dream-8B-Instruct .

README.md Unescape Escape

Dream

Table of Contents

Setup

Files overview