1127 update to latest

2025-11-27 15:44:17 +08:00
parent e16c84aab2
commit a34d39430e
153 changed files with 25705 additions and 53 deletions
--- a/dllm/examples/dream/README.md
+++ b/dllm/examples/dream/README.md
@ -0,0 +1,187 @@
+# Dream
+
+> 📄 Paper: [Dream 7B: Diffusion Large Language Models](https://arxiv.org/abs/2508.15487) ｜ 💻 Code: [github.com/DreamLM/Dream](https://github.com/DreamLM/Dream)
+
+Resources and examples for training (finetuning & pretraining) and evaluating diffusion language models **Dream**.
+
+## Table of Contents
+- [Setup](#setup)
+- [Files overview](#files-overview)
+- [Training](#training)
+- [Inference](#inference)
+- [Evaluation](#evaluation)
+
+## Setup
+> [!IMPORTANT]  
+> **Slurm users:** Update `scripts/train.slurm.sh` and `mkdir logps`: see [(optional) Slurm setup](/README.md/#optional-slurm-setup) for details.
+>
+
+
+##  Files overview
+```
+# tools relevant with Dream
+dllm/pipelines/dream
+├── __init__.py                     # Package initialization
+├── models/
+│   ├── configuration_dream.py      # Dream model configuration
+│   ├── generation_utils.py         # Diffusion-based generation logic
+│   ├── modeling_dream.py           # Core Dream model architecture
+│   └── tokenization_dream.py       # Tokenizer implementation for Dream
+├── generator.py                    # Inference logic
+├── trainer.py                      # Training logic (pretraining and SFT)
+└── utils.py                        # Auxiliary utilities and helper functions
+
+# example entry points for training / inference / evaluation
+examples/dream
+├── chat.py                         # Interactive inference example
+├── eval.sh                         # Automatic evaluation script
+├── generate.py                     # Inference example
+├── pt.py                           # Pretraining example
+├── README.md                       # Documentation (you are here)
+└── sft.py                          # Supervised finetuning example
+```
+<!-- > [!NOTE]
+>  We slightly modified [`modeling_dream.py`](/dllm/pipelines/dream/models/modeling_dream.py) so that the `model.forward()` supports 2-D attention masks. We recommend loading models with `dllm.utils.get_tokenizer`; otherwise `import dllm` before calling `AutoModel.from_pretrained` to ensure the correct models from `dllm` are used. 
+> 
+> We fixed bugs in `chat_template` and standardize `mask_token` through `dllm.utils.get_tokenizer`. If you use `AutoTokenizer`, keep in mind to set `chat_template` and `mask_token` appropriately yourselves. -->
+
+## Training
+
+### Finetuning
+For example, to SFT [`Dream-v0-Base-7B`](https://huggingface.co/Dream-org/Dream-v0-Base-7B) for instruction following on 8 GPUs, run:
+```shell
+accelerate launch \
+    --config_file scripts/accelerate_configs/fsdp.yaml \
+    examples/dream/sft.py \
+    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
+    --dataset_args "allenai/tulu-3-sft-mixture" \
+    --output_dir "models/Dream-7B-SFT/tulu-3-sft-mixture" \
+    --max_length 1024 \
+    --num_train_epochs 4 \
+    --learning_rate 2e-5
+```
+If you are using slurm and want to train across, for example, 2 nodes (16 GPUs total), run:
+```shell
+sbatch --nodes=2 --gres=gpu:8 scripts/train.slurm.sh \
+    --accelerate_config "fsdp" \
+    --script_path "examples/dream/sft.py" \
+    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
+    --dataset_args "allenai/tulu-3-sft-mixture" \
+    --output_dir "models/Dream-7B-SFT/tulu-3-sft-mixture" \
+    --max_length 1024 \
+    --num_train_epochs 4 \
+    --learning_rate 2e-5
+```
+
+<!-- **Reproducing [Dream-v0-Instruct-7B](https://huggingface.co/Dream-org/Dream-v0-Base-7B)**. We tried our best to reproduce Dream-v0-Instruct-7B by finetuning Dream-v0-Base-7B using our training pipeline on the public instruction-following dataset [allenai/tulu-3-sft-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture): -->
+#### Reproducing [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B)
+We tried our best to reproduce [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) by finetuning [`Dream-v0-Base-7B`](https://huggingface.co/Dream-org/Dream-v0-Base-7B) using our training pipeline on the public instruction-following dataset [`allenai/tulu-3-sft-mixture`](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture):
+
+```shell
+# preprocessing SFT data (optional, but can avoid redundant preprocessing for multi-node training)
+PYTHONPATH=. python dllm/tools/preprocess_sft_dataset.py \
+    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
+    --sft_map_fn_path "examples.dream.sft.sft_map_fn" \
+    --dataset_args "allenai/tulu-3-sft-mixture" \
+    --output_dir "data/sft/dream/tulu-3-sft-mixture" \
+    --num_proc 64
+
+# train on 24*8=192 A100s with FSDP, take about 8 hours
+sbatch --nodes=24 --gres=gpu:8 scripts/train.slurm.sh \
+    --accelerate_config "fsdp" \
+    --script_path "examples/dream/sft.py" \
+    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
+    --dataset_args "data/sft/dream/tulu-3-sft-mixture" \
+    --load_preprocessed_data True \
+    --output_dir "models/Dream-7B-SFT-tulu3-fsdp-bs4-len2048-ep5-lr1e-5" \
+    --max_length 2048 \
+    --truncation "right" \
+    --group_by_length True \
+    --num_train_epochs 5 \
+    --learning_rate 1e-5 \
+    --per_device_train_batch_size 2 \
+    --gradient_accumulation_steps 2 \
+    --per_device_eval_batch_size 2 \
+    --eval_on_start False \
+    --eval_steps 0.1 \
+    --save_steps 0.05
+```
+<!-- [TODO] Training curves are on Wandb; checkpoints with evaluation results are available on Hugging Face. See the [Evaluation](#evaluation) section below for evaluation instructions. -->
+
+### Pretraining
+
+Pretrain on [`mlfoundations/dclm-baseline-1.0`](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) from scratch using 192 GPUs (24x8) and FSDP:
+```shell
+sbatch --nodes=24 --gres=gpu:8 scripts/train.slurm.sh \
+    --accelerate_config "fsdp" \
+    --script_path "examples/dream/pt.py" \
+    --model_name_or_path "Dream-org/Dream-v0-Base-7B" \
+    --dataset_args "mlfoundations/dclm-baseline-1.0" \
+    --output_dir "models/Dream-7B-PT/dclm-baseline-1.0" \
+    --max_length 1024 \
+    --max_steps 2000 \
+    --learning_rate 3e-4
+```
+
+## Inference
+We support batch inference for standard generation and infilling:
+<!-- See [`examples/dream/generate.py`](/examples/dream/generate.py) for a full example: -->
+```shell
+python examples/dream/generate.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B"
+```
+We also support interactive multi-turn dialogue with visualization:
+```shell
+python examples/dream/chat.py --model_name_or_path "Dream-org/Dream-v0-Instruct-7B"
+```
+
+## Evaluation  
+> Read [(optional) Evaluation setup](/README.md/#optional-evaluation-setup) before running evaluation. 
+
+For example, to evaluate [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) on [`MMLU-Pro`](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro) using 4 GPUs, run:
+```shell
+# Use model_args to adjust the generation arguments for evalution.
+accelerate launch --num_processes 4 \
+    dllm/pipelines/dream/eval.py \
+    --tasks "mmlu_pro" \
+    --model "dream" \
+    --apply_chat_template \
+    --num_fewshot 0 \
+    --model_args "pretrained=Dream-org/Dream-v0-Instruct-7B,mc_num=1,max_new_tokens=128,max_length=128,steps=128,temperature=0.1,top_p=0.9,add_bos_token=true,escape_until=true"
+```
+
+To automatically evaluate [`Dream-v0-Base-7B`](https://huggingface.co/Dream-org/Dream-v0-Base-7B) and [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B) on all benchmarks, run:
+```shell
+bash examples/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Instruct-7B" --instruct True
+bash examples/dream/eval.sh --model_name_or_path "Dream-org/Dream-v0-Base-7B" --instruct False
+```
+
+### Evaluation results
+
+>  Results (evaluated) are evaluated using our framework, while results (reported) come from the original paper. All evaluation settings follow the configurations in the [Dream](https://github.com/DreamLM/Dream) repository, with minor adjustments. Placeholder entries (“–”) indicate results not yet evaluated; full results will be released soon.
+
+|                 | MMLU | BBH | ARC&#8209;C  | ARC&#8209;E | Hellaswag | WinoGrande | PIQA | GSM8K | Math | GPQA | HumanEval | MBPP | RACE | Countdown | Sudoku | Trip&nbsp;planning | 
+|:----------------|:-------:|:-------:|:-----:|:-----:|:-----------:|:------------:|:----:|:-----:|:----:|:----:|:-----------:|:----:|:------:|:-----------:|:----:|:-----------:|
+| [`Dream-v0-Base-7B`](https://huggingface.co/Dream-org/Dream-v0-Base-7B) (reported) | 69.5 | 57.9 | 59.9 | 83.9 | 73.3 | 74.8 | 75.8 | 77.2 | 39.6 | 36.6 | 57.9 | 56.2 | 44.7 | 16.0 | 81.0 | 17.8 |
+| [`Dream-v0-Base-7B`](https://huggingface.co/Dream-org/Dream-v0-Base-7B) (evaluated) | – | – | 59.7 | 83.3 | 73.1 | 72.9 | 72.0 | 69.6 | – | 35.5 | 45.8 | – | 43.0 | – | – | – |
+
+
+<p align="center" style="color: #808080; font-size: 0.9em;">
+Table 1. Evaluation results of 
+<a href="https://huggingface.co/Dream-org/Dream-v0-Base-7B" style="color: #808080; text-decoration: none;">
+<code>Dream-8B-Base</code>
+</a>.
+</p>
+
+|  | MMLU | MMLU-Pro | GSM8K | Math | GPQA | HumanEval | MBPP | IFEval |
+|:----------------|:----:|:---------:|:-----:|:----:|:----:|:-----------:|:----:|:----:|
+| [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B)(reported)  | 67.0 | 43.3 | 81.0 | 39.2 | 33.0 | 55.5 | 58.8 | 62.5 |
+| [`Dream-v0-Instruct-7B`](https://huggingface.co/Dream-org/Dream-v0-Instruct-7B)(evaluated) | – | 43.0 | 82.6 | 39.9 | 32.4 | 59.1 | – | 62.3 |
+
+<p align="center" style="color: #808080; font-size: 0.9em;">
+Table 2. Evaluation results of 
+<a href="https://huggingface.co/Dream-org/Dream-v0-Instruct-7B" style="color: #808080; text-decoration: none;">
+<code>Dream-8B-Instruct</code>
+</a>.
+</p>
+
+