This repository contains the code to reproduce the experiments from the paper 'Tailed U-Net: Multi-Scale Music Representation Learning' , which is a mix of Spijkervets & Burgoyne (2021) and Castellon et al. (2021) repositories.
The weights for the trained models are available at here
The TUNe architecture:
Our main results we obtained with our two best performing models representations on MTT are:
Model | Supervised | # parameters (M) | Epochs | Fine-tune head | ROC-AUC | PR-AUC |
---|---|---|---|---|---|---|
TUNe | X | 2.1 | 10000 | Linear Classifier | 89.5 | 37.0 |
TUNe+ | X | 2.2 | 10000 | Linear Classifier | 89.3 | 37.1 |
CLMR | X | 2.4 | 10000 | Linear Classifier | 88.7 | 35.6 |
musicnn | ✓ | 11.8* | 10000 | Jukemir Probe | 90.7 | 38.4 |
- Last reported number of parameters was from Pons et al. (2017)
Model | MTT_{ROC} | MTT_{AP} | GTZAN | GiantSteps | EMO_{A} | EMO_{V} |
---|---|---|---|---|---|---|
TUNe | 90.3 | 38.1 | 67.6 | 13.7 | 60.5 | 55.7 |
TUNe+ | 90.3 | 38.0 | 64.5 | 15.5 | 64.7 | 45.9 |
CLMR | 89.4 | 36.1 | 68.6 | 14.9 | 67.8 | 45.8 |
Jukebox | 91.5 | 41.4 | 79.7 | 66.7 | 72.1 | 61.7 |
# cloning the repository
git clone https://github.com/Marcel-Velez/TUNe.git
# enter the cloned repository
cd TUNe
# install the dependencies
pip3 install -r requirements.txt
After these steps you can go two ways, first we will explain how to train our TUNe model(s) with the CLMR framework. Second, we will exlpain how to probe said trained models on the jukemir tasks.
This segment is based on the repository of from Janne Spijkervet (Spijkervet & Burgoyne (2021)). Part 1 and 3 can be ran on Windows, part 2 needs linux or mac because of the 'soundfile' backend needed for some of the audio augmentations used in CLMR framework.
The parameter options and their default value can be found in './config/config.yaml'
First we have to download the datasets we want to use the audio of (default: MagnaTagATune)
# download a dataset and preprocess it to the .mp3 and certain samplerate
python3 clmr_preprocess.py
# download a different dataset
python3 clmr_preprocess.py --data gtzan
The Contrastive learning of musical representations
# download a dataset and preprocess the data, e.g. convert it to the .mp3 with a specific samplerate
python3 clmr_preprocess.py
# train a model (default: TunePlus) with an m1 chip
python3 clmr_main.py --accelerator mps --workers 10
# continue training from a checkpoint
# runs are automatically checkpointed when training in './runs/lightning_logs/XXX'
python3 clmr_main.py --checkpoint_path ./runs/lightning_logs/XXX/checkpoints/epoch=YYYY-step=ZZZZZ.ckpt --accelerator mps --workers 10
In order to evaluate a learned representation, one has to pass along the path to the trained model:
# run the evaluation script for a given checkpoint of a trained model
python3 clmr_linear_evaluation.py --checkpoint_path ./runs/lightning_logs/XXX/checkpoints/epoch=YYYY-step=ZZZZZ.ckpt
This section explains how the repository of Castellon et al. (2021) can be used on CLMR trained models The code has been slightly altered and combined from the original repository. All these steps are executable on windows, mac, and linux.
The jukemir_preprocess.sh will execute a bash file where we loop over the four datasets, first downloading all datasets and then converting the audio to the same type and samplerate. The audio will be stored in the './data/DATASET_NAME' repository. NOTE: MagnaTagATune is shared by both Jukemir and CLMR so will not be downloaded twice.
# execute the bash file
bash jukemir_preprocess.sh
The default location for the Tune5Tail and TunePlus model are : './checkpoints/tune_5_tail_epoch_10000.ckpt' and './checkpoints/tune_plus_epoch_10000.ckpt' For custom models one has to (import has to be added to the python file) and specify the path to the trained custom model checkpoint
# extract representations for the MagnaTagATune dataset for the default model, TunePlus
python3 jukemir_extract_representations --data magna
# extract representations for the Emomusic dataset for the default model, TunePlus
python3 jukemir_extract_representations --data emomu
# extract representations for the GTZAN fault-filtered dataset for the Tune5Tail model
python3 jukemir_extract_representations --data gtzan --model Tune5Tail
# extract representations for the giantsteps dataset for a custom model
python3 jukemir_extract_representations --data giant --model custom_model --checkpoint ./path/to/checkpoint
This took a couple of minutes for the Emomusic dataset up to 10 hours for the Giantsteps dataset and for MagnaTagATune and GTZAN around 2-3 hours.
This is the sequential training of the probes, the original authors also offer a parallel way of training on their repository. The 'jukemir_train_probes.py' first creates configurations for every dataset according to Castellon et al. (2021) probe parameter configuration. Next every probe is trained for every dataset (216 probes per dataset).
# run the probe training script
python3 jukemir_train_probes.py
# evaluate the performance of the best probe configuration for every of the 4 dataset
bash jukemir_evaluate.sh
- changed part of the gtzan.py code from clmr in "clmr/datasets/gtzan.py" because the download link from the gtzan dataset was unavailable at the time of writing.
- added checkpoint saving every 200 epochs to "clmr.XXXX.contrastivelearning.py" line 35/36