👍👍👍🤙🤙🤙
This repository contains the description of NSynth-100 and Free-sound cilps of 89 classes (FSC-89) , which are proposed in the following paper:
Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes. INTERSPEECH, 2023.
Wei Xie, Yanxiong Li,Qianhua He, Wenchang Cao and Tuomas Virtanen
Motivation for constructing the datasets:
To study the Few-shot Class-incremental Audio Classification (FCAC) problem, we constructed the NSynth-100 dataset and FSC-89 dataset using partial samples from the NSynth dataset and the FSD-MIX-CLIPS dataset as the source materials, respectively.
Table of Contents
- Statistics on the datasets
- Preparation of the NSynth-100 dataset
- Preparation of the FSC-89 dataset
- Acknowledgment
- Contact
- Citation
NSynth-100 | FSC-89 | |
---|---|---|
Type of audio | Musical instruments | Free sound |
Num. of classes | 100 (55 of base classes, 45 of novel classes) | 89 (59 of base classes, 30 of novel classes) |
Num. of training / validation / testing samples per base class | 200 / 100 / 100 | 800 / 200 / 200 |
Num. of training / validation / testing samples per novel class | 100 / none / 100 | 500 / none / 200 |
Duration of the sample | All in 4 seconds | All in 1 second |
Sampling frequency | All in 16K Hz | All in 44.1K Hz |
The NSynth dataset is an audio dataset containing 306,043 musical notes, each with a unique pitch, timbre, and envelope. Those musical notes are belonging to 1,006 musical instruments.
Before constructing the NSynth-100 dataset, we first conduct some statistical analysis on the NSynth dataset, see here.
Based on the statistical results, we obtain the NSynth-100 dataset by the following steps:
- Download Train set, Valid set, and test set of the NSynth dataset to your local machine and unzip them. You should get a structure of the directory as follows:
Your dataset root (NSynth_audio_for_FCAC) ├── nsynth-train # Training set of the NSynth dataset │ ├── audio │ | ├── bass_acoustic_000-024-025.wav │ | └── .... │ └── examples.json # meta file of the training set │ ├── nsynth-val # Validation set of the NSynth dataset │ ├── audio │ | ├── bass_electronic_018-022-025.wav │ | └── .... │ └── examples.json │ └── nsynth-test # Test set of the NSynth dataset ├── audio | ├── bass_electronic_018-022-100.wav | └── .... └── examples.json
- Download the meta files for FCAC from here to your local machine and unzip them. You should get a structure of the directory as follows:
Your dataset root (NSynth_meta_for_FCAC) ├── nsynth-100-fs-meta │ ├── nsynth-100-fs_train.csv # containing information of all training samples from the base and novel classes │ ├── nsynth-100-fs_val.csv # containing information of all validation samples from the base classes │ ├── nsynth-100-fs_test.csv # containing information of all test samples from the old and novel classes │ └── nsynth-100-fs_vocab.json # label vocabulary of the dataset │ ├── nsynth-200-fs-meta │ ├── nsynth-200-fs_train.csv # │ ├── nsynth-200-fs_val.csv │ ├── nsynth-200-fs_test.csv │ └── nsynth-200-fs_vocab.json │ ├── nsynth-300-fs-meta │ ├── nsynth-300-fs_train.csv # │ ├── nsynth-300-fs_val.csv │ ├── nsynth-300-fs_test.csv │ └── nsynth-300-fs_vocab.json │ └── nsynth-400-fs-meta ├── nsynth-400-fs_train.csv # ├── nsynth-400-fs_val.csv ├── nsynth-400-fs_test.csv └── nsynth-400-fs_vocab.json
- Run the following script to load the NSynth-100 dataset:
python Load_nsynth_data_for_FCAC.py --metapath path to NSynth_audio_for_FCAC folder --audiopath path to NSynth_meta_for_FCAC folder --num_class 100 --base_class 55
-
Since the FSC-89 dataset is extracted from the FSD-MIX-CLIPS dataset, we need to prepare the FSD-MIX-CLIPS dataset first. See the instructions in here.
-
Download the meta file of FSC-89 dataset from here, You should get a structure of the directory as follows:
FSC-89-meta ├── setup1 # | ├── Fsc89-setup1-fsci_train.csv # - | ├── Fsc89-setup1-fsci_val.csv # - | └── Fsc89-setup1-fsci_test.csv # - | └── setup2 # - ├── Fsc89-setup2-fsci_train.csv # - ├── Fsc89-setup2-fsci_val.csv # - └── Fsc89-setup2-fsci_test.csv # -
- Run the following script to load the FSC-89 dataset:
python load_fsc_89_data_for_FCAC.py --metapath path to FSC-89-meta folder \
--datapath path to FSD-MIX-CLIPS_data folder --data_type audio --setup setup1
Our project references the codes in the following repos.
If you find this repository helpful, please consider citing:
@inproceedings{xie23b_interspeech,
author={Wei Xie and Yanxiong Li and Qianhua He and Wenchang Cao and Tuomas Virtanen},
title={{Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
pages={301--305},
doi={10.21437/Interspeech.2023-1380}
}