Awesome Speaker In Speech Field

Awesome Speaker In Speech Field

Awesome Speaker In Speech Field

Hi, everyone! I’m Junjie Li [Homepage], currently a Ph.D. student at Hong Kong Polytechnic University (PolyU) 🇭🇰. This repository aims to help students become familiar with speaker-related tasks while also serving as a resource for my own learning and development.

Summary of speaker related tasks, like speaker recognition, verification, diarization, spoofing, privacy, voice conversion, target speaker extraction and so on.

Book recommendations

Understanding Deep learning [pdf]
Computer vision: models learning and inference [pdf]
深入浅出强化学习：原理入门 [pdf]
Reinforcement Learning [pdf]

Basic Knowledge of Machine Learning

Speaker Recognition/Verification:

Toolkit

Wespeaker

Speaker Models

overview:
- Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
i-vector
d-vector (frame-level): Deep neural networks for small footprint textdependent speaker verification
x-vector (segment-level):
- X-vectors: Robust dnn embeddings for speaker recognition
- Deep Neural Network Embeddings for Text-Independent Speaker Verification
- ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
r-vector: But system description to voxceleb speaker recognition challenge 2019
xi-vector: Xi-Vector Embedding for Speaker Recognition
Transformer based:
- Self Multi-Head Attention for Speaker Recognition
- LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
SPEAKER RECOGNITION FROM RAW WAVEFORM WITH SINCNET
Self-supervised:
- Self-supervised speaker embeddings
- Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics
PLDA: Probabilistic Linear Discriminant Analysis for Inferences About Identity
Reshape Dimensions Network for Speaker Recognition
Guided Speaker Embedding

Aggregation Layers

implementation: wespeaker/models/pooling_layers

Temporal Average Pooling (TAP)
Temporal Statistics Pooling (TSTP): X-vectors: Robust dnn embeddings for speaker recognition
Attentive Statistics Pooling (ASP): ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

Datasets

Voxceleb1
Voxceleb2

Challenge

NIST SRE

Voice Conversion

Non-parallel:

CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks

Voice Anonymization

MODELING PSEUDO-SPEAKER UNCERTAINTY IN VOICE ANONYMIZATION

Target Speaker Extraction

INVESTIGATION OF SPEAKER REPRESENTATION FOR TARGET-SPEAKER SPEECH PROCESSING

Speaker Diarization

Spoofing

Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing

Targer Speaker ASR

Personalized VAD

Others

Emerging Properties in Self-Supervised Vision Transformers

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
resources		resources
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Speaker In Speech Field

Book recommendations

Basic Knowledge of Machine Learning

Speaker Recognition/Verification:

Toolkit

Speaker Models

Aggregation Layers

Datasets

Challenge

Voice Conversion

Non-parallel:

Voice Anonymization

Target Speaker Extraction

Speaker Diarization

Spoofing

Targer Speaker ASR

Personalized VAD

Others

About

Releases

Packages

mrjunjieli/awesome_speaker

Folders and files

Latest commit

History

Repository files navigation

Awesome Speaker In Speech Field

Book recommendations

Basic Knowledge of Machine Learning

Speaker Recognition/Verification:

Toolkit

Speaker Models

Aggregation Layers

Datasets

Challenge

Voice Conversion

Non-parallel:

Voice Anonymization

Target Speaker Extraction

Speaker Diarization

Spoofing

Targer Speaker ASR

Personalized VAD

Others

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages