Skip to main content

Speaker Classification

实战练习

Transformer实战练习,代码见Github仓库

This is a practice of Transformer, follow the guide of Github Repo.

image-20240113175506430

Overview

Classify the speaker of given features, learn how to use Transformer and how to adjust parameters of transformer.

Dataset

The original dataset is VoxCeleb1.

We randomly select 600 speakers from VoxCeleb1, then preprocess the raw waveforms into mel-spectrograms. You can download the preprocessed dataset from Google Drive.

Screenshot 2024-01-13 163041

Arguments:

  • data_dir: The path to the data directory.

  • metadata_path: The path to the metadata.

  • segment_len: The length of audio segment for training.

The architecture of dataset directory is shown below, where uttr-{random string}.pt represents PyTorch data file containing valid mel-spectrogram data.

data directory/
├── mapping.json
├── metadata.json
├── testdata.json
└── uttr-{random string}.pt

This is also the assignment solution of ML2021Spring HW4.