10 KiB
Generative BERT
This directory provides two key sets of resources:
- Toy Examples (Warmup): Scripts for pretraining and SFTing any BERT-style model on small datasets to generate text.
- Official Scripts (BERT Chat): The exact training, inference, and evaluation scripts used to create the
ModernBERT-base-chat-v0andModernBERT-large-chat-v0checkpoints, two BERTs finetuned as Chatbots. For a deep dive into experimental results, lessons learned, and more reproduction details, please see our full BERT Chat W&B Report.
Chat with ModernBERT-large-chat-v0. See Inference for details.
Files overview
# example entry points for training / inference / evaluation
examples/bert
├── chat.py # Interactive inference example
├── eval.sh # Automatic evaluation script
├── generate.py # Inference example
├── pt.py # Pretraining example
├── README.md # Documentation (you are here)
└── sft.py # Supervised finetuning example
Warmup
In this section, we show toy examples of pretraining and SFTing ModernBERT-large on small datasets to generate text.
You can use any BERT model instead for example, by --model_name_or_path "FacebookAI/roberta-large".
Pretrain
To train ModernBERT-large on the tiny-shakespeare dataset, run:
accelerate launch --config_file scripts/accelerate_configs/ddp.yaml --num_processes 1 \
examples/bert/pt.py \
--model_name_or_path "answerdotai/ModernBERT-large" \
--dataset_args "Trelis/tiny-shakespeare" \
--text_field "Text" \
--insert_eos False \
--max_length 128 \
--num_train_epochs 20 \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 64 \
--save_steps 0.1 \
--output_dir "models/ModernBERT-large/tiny-shakespeare"
To run inference with the model:
# just press enter (empty prompt) if you want the model to generate text from scratch
python -u examples/bert/chat.py \
--model_name_or_path "models/ModernBERT-large/tiny-shakespeare/checkpoint-final" \
--chat False --remasking "random" --steps 128 --max_new_tokens 128
SFT
To train ModernBERT-large on the alpaca dataset, run:
accelerate launch --config_file scripts/accelerate_configs/ddp.yaml --num_processes 8 \
examples/bert/sft.py \
--model_name_or_path "answerdotai/ModernBERT-large" \
--dataset_args "tatsu-lab/alpaca" \
--max_length 512 \
--num_train_epochs 20 \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 64 \
--save_steps 0.1 \
--output_dir "models/ModernBERT-large/alpaca"
To chat with the model:
python -u examples/bert/chat.py \
--model_name_or_path "models/ModernBERT-large/alpaca/checkpoint-final" --chat True
BERT Chat
Here we show the exact commands we use to train and interact with the BERT Chat models:
ModernBERT-base-chat-v0 and ModernBERT-large-chat-v0.
For training curves and other details, please see BERT Chat W&B Report.
Training
To reproduce ModernBERT-base-chat-v0, run:
accelerate launch --config_file scripts/accelerate_configs/zero2.yaml --num_processes 8 \
examples/bert/sft.py \
--model_name_or_path "answerdotai/ModernBERT-base" \
--dataset_args "allenai/tulu-3-sft-mixture|HuggingFaceTB/smoltalk" \
--max_length 1024 \
--num_train_epochs 10 \
--per_device_train_batch_size 48 \
--per_device_eval_batch_size 48 \
--save_steps 0.1 \
--output_dir "models/ModernBERT-base/tulu-3-smoltalk/epochs-10-bs-384-len-1024"
To reproduce ModernBERT-large-chat-v0, run:
accelerate launch --config_file scripts/accelerate_configs/zero2.yaml --num_processes 8 \
examples/bert/sft.py \
--model_name_or_path "answerdotai/ModernBERT-large" \
--dataset_args "allenai/tulu-3-sft-mixture|HuggingFaceTB/smoltalk" \
--max_length 1024 \
--num_train_epochs 10 \
--per_device_train_batch_size 48 \
--per_device_eval_batch_size 48 \
--save_steps 0.1 \
--output_dir "models/ModernBERT-large/tulu-3-smoltalk/epochs-10-bs-384-len-1024"
Inference
To chat with the model:
python -u examples/bert/chat.py --model_name_or_path "dllm-collection/ModernBERT-large-chat-v0" --chat True
Evaluation
Read (optional) Evaluation setup before running evaluation.
For example, to evaluate ModernBERT-large-chat-v0 on MMLU-Pro using 4 GPUs, run:
# Use model_args to adjust the generation arguments for evalution.
accelerate launch --num_processes 4 \
dllm/pipelines/bert/eval.py \
--tasks "mmlu_pro" \
--model "bert" \
--apply_chat_template \
--num_fewshot 0 \
--model_args "pretrained=dllm-collection/ModernBERT-large-chat-v0,is_check_greedy=False,mc_num=1,max_new_tokens=256,steps=256,block_length=256"
To automatically evaluate ModernBERT-base-chat-v0 and ModernBERT-large-chat-v0 on all benchmarks, run:
bash examples/bert/eval.sh --model_name_or_path "dllm-collection/ModernBERT-base-chat-v0"
bash examples/bert/eval.sh --model_name_or_path "dllm-collection/ModernBERT-large-chat-v0"
Evaluation results
| LAMBADA | GSM8K | CEval | BBH | MATH | MMLU | Winogrande | HellaSwag | CMMLU | |
|---|---|---|---|---|---|---|---|---|---|
ModernBERT-base-chat-v0(evaluated) |
49.3 | 5.9 | 25.0 | 17.9 | 3.1 | 26.1 | 49.7 | 41.0 | 24.3 |
ModernBERT-large-chat-v0(evaluated) |
46.3 | 17.1 | 24.6 | 25.1 | 3.8 | 33.5 | 53.1 | 45.0 | 27.5 |
Qwen1.5-0.5B(reported & evaluated) |
48.6 | 22.0 | 50.5 | 18.3 | 3.1 | 39.2 | 55.0 | 48.2 | 46.6 |
Qwen1.5-0.5B-Chat(reported & evaluated) |
41.2 | 11.3 | 37.2 | 18.2 | 2.1 | 35.0 | 52.0 | 36.9 | 32.2 |
gpt2(reported & evaluated) |
46.0 | 0.7 | 24.7 | 6.9 | 1.8 | 22.9 | 51.6 | 31.1 | 25.2 |
gpt2-medium(reported & evaluated) |
55.5 | 2.1 | 24.6 | 17.8 | 1.4 | 22.9 | 53.1 | 39.4 | 0.3 |
Table 1. Evaluation results of
ModernBERT-base-chat-v0
,
ModernBERT-large-chat-v0
,
Qwen1.5-0.5B
,
Qwen1.5-0.5B-Chat
,
gpt2
, and
gpt2-medium
.
Underlined entries are results from official reports: GPT-2 paper, Qwen 1.5 blog, and Qwen2-0.5B-Instruct model card. All other results are evaluated using our framework.
