Files
MIDIFoundationModel/readme.md

165 lines
5.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 🎵 Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music
<p align="center">
<a href="https://huggingface.co/longyu1315/Amadeus-S">
<img src="https://img.shields.io/badge/🤗-Amadeus--S-yellow" alt="HuggingFace">
</a>
<a href="https://arxiv.org/abs/2508.20665">
<img src="https://img.shields.io/badge/arXiv-2508.20665-blue" alt="arXiv">
</a>
</p>
**Amadeus** is a novel **symbolic music (MIDI) generation framework**. We use **autoregressive modeling** for note sequences, **discrete diffusion models** for intra-note attributes, and **representation optimization** to enhance model performance. Compared to current mainstream autoregressive or hierarchical autoregressive models, Amadeus achieves significant improvements in **generation quality, speed, and controllability**. While significantly improving generation quality, we have achieved a speedup of at least **4x** compared to pure autoregressive models. We also support a training-free **fine-grained attribute control** mechanism, which endows Amadeus with maximum flexibility. We will continuously update the **code, models, and datasets**.
***
## 🏗️ Model Architecture
<p align="center">
<img src="assets/amadeus-framwork.drawio.png" alt="Amadeus architecture" width="600">
</p>
***
## 📅 Changelog
* 2025-08-28: Released inference code and the **Amadeus-S** model
***
## ⚙️ Installation and Usage
Set up the environment (inference only):
!!! 训练使用environment yml创建环境
conda env create -f environment.yml
```bash
conda create -n amadeus_slim python=3.10
conda activate amadeus_slim
pip install -r demo/requirements.txt
```
First run:
```bash
# Chinese interface
python demo/Amadeus_app_CN.py
# English interface
python demo/Amadeus_app_EN.py
```
> Note:
>
> `Amadeus_app_CN.py`
>
> is for the Chinese interface, and
>
> `Amadeus_app_EN.py`
>
> is for the English interface.
👉 The model will be automatically downloaded to the `models/` folder, which includes a usable **soundfont**. Please modify the path of `DEFAULT_SOUND_FONT` in `Amadeus/symbolic_encoding/``midi2audio.py`.
Example of command-line generation:
```
python generate.py -wandb\_exp\_dir models/Amadeus-S -text\_encoder\_model google/flan-t5-base -temperature 2 -prompt "A lively and melodic pop rock song featuring piano, overdriven guitar, electric drum and electric bass, set in a fast 4/4 tempo and the key of C# minor, with a frequently recurring chord progression of D, A, C#m, and F# that evokes a mix of emotion and love."
```
***
## 📂 Repository Structure
```
Amadeus/
├── demo/ # Example scripts and interfaces (CN/EN)
├── Amadeus/ # Core model and symbolic encoding
├── assets/ # Architecture diagrams and sample audio files
├── data\_representation # Data processing
├── models/ # Downloaded or cached pre-trained models
└── generate.py # Command-line generation entry point
```
***
## 📊 Evaluation Results
We evaluated **generation speed, text alignment, and note attribute control accuracy** on the **MidiCaps** dataset. The results are as follows:
| Model | Speed (notes/s) | CLAP ↑ | TBT ↑ | CK ↑ | CTS ↑ | CI ↑ | CMtop3 ↑ |
| -------------- | --------------- | -------- | --------- | --------- | --------- | --------- | --------- |
| Text2Midi | 4.02 | 0.19 | 31.76 | 22.22 | 84.15 | 19.92 | 60.57 |
| MuseCoco | 1.67 | 0.19 | 34.21 | 14.66 | 94.24 | 22.42 | 38.18 |
| T2M-inferalign | 4.02 | 0.20 | 39.32 | 29.80 | 84.32 | 20.13 | 47.74 |
| **Amadeus** | **16.23** | 0.20 | 73.93 | 39.31 | 96.98 | 26.01 | 65.52 |
| **Amadeus-M** | 10.51 | **0.21** | **76.31** | **43.07** | **97.02** | **27.11** | **66.39** |
***
## 🤝 Acknowledgements and Contributions
The development of Amadeus is inspired by the music and AI communities, with the goal of **serving music creators, not replacing them**.
We welcome developers and researchers to contribute code or provide suggestions — please reach out to us via **Issues** or **Pull Requests**.
Part of the design of this project references [JudeJiwoo/nmt](https://github.com/JudeJiwoo/nmt), and we would like to express our gratitude here 🙏.
***
## ⚠️ Notes
The current model is relatively small and may not always generate MIDI that fully matches the description.
You can try **slightly adjusting parameters such as temperature or top-p** to improve the results.
We will continue to improve the model to provide more stable and higher-quality generation.
***
## 📚 Citation
If you find Amadeus helpful for your research or createplease cite our paper:
```bibtex
@article{su2025amadeus,
title = {Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music},
author = {Su, Hongju and Li, Ke and Yang, Lan and Zhang, Honggang and Song, Yi-Zhe},
journal = {arXiv preprint arXiv:2508.20665},
year = {2025}
}