Hi, everyone! I’m Junjie Li [Homepage], currently a Ph.D. student at Hong Kong Polytechnic University (PolyU) 🇭🇰. This repository aims to help students become familiar with speaker-related tasks while also serving as a resource for my own learning and development.
Summary of speaker related tasks, like speaker recognition, verification, diarization, spoofing, privacy, voice conversion, target speaker extraction and so on.
- Understanding Deep learning [pdf]
- Computer vision: models learning and inference [pdf]
- 深入浅出强化学习:原理入门 [pdf]
- Reinforcement Learning [pdf]
- overview:
- Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
- i-vector
- d-vector (frame-level): Deep neural networks for small footprint textdependent speaker verification
- x-vector (segment-level):
- X-vectors: Robust dnn embeddings for speaker recognition
- Deep Neural Network Embeddings for Text-Independent Speaker Verification
- ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
- r-vector: But system description to voxceleb speaker recognition challenge 2019
- xi-vector: Xi-Vector Embedding for Speaker Recognition
- Transformer based:
- Self Multi-Head Attention for Speaker Recognition
- LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
- SPEAKER RECOGNITION FROM RAW WAVEFORM WITH SINCNET
- Self-supervised:
- Self-supervised speaker embeddings
- Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics
- PLDA: Probabilistic Linear Discriminant Analysis for Inferences About Identity
- Reshape Dimensions Network for Speaker Recognition
- Guided Speaker Embedding
implementation: wespeaker/models/pooling_layers
- Temporal Average Pooling (TAP)
- Temporal Statistics Pooling (TSTP): X-vectors: Robust dnn embeddings for speaker recognition
- Attentive Statistics Pooling (ASP): ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
- Voxceleb1
- Voxceleb2
- CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks
- MODELING PSEUDO-SPEAKER UNCERTAINTY IN VOICE ANONYMIZATION
- INVESTIGATION OF SPEAKER REPRESENTATION FOR TARGET-SPEAKER SPEECH PROCESSING
- Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
- Emerging Properties in Self-Supervised Vision Transformers