Skip to content

mrjunjieli/awesome_speaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Awesome Speaker In Speech Field

Hi, everyone! I’m Junjie Li [Homepage], currently a Ph.D. student at Hong Kong Polytechnic University (PolyU) 🇭🇰. This repository aims to help students become familiar with speaker-related tasks while also serving as a resource for my own learning and development.

Summary of speaker related tasks, like speaker recognition, verification, diarization, spoofing, privacy, voice conversion, target speaker extraction and so on.

Book recommendations

  • Understanding Deep learning [pdf]
  • Computer vision: models learning and inference [pdf]
  • 深入浅出强化学习:原理入门 [pdf]
  • Reinforcement Learning [pdf]

Basic Knowledge of Machine Learning

Speaker Recognition/Verification:

Toolkit

Speaker Models

  • overview:
    • Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
  • i-vector
  • d-vector (frame-level): Deep neural networks for small footprint textdependent speaker verification
  • x-vector (segment-level):
    • X-vectors: Robust dnn embeddings for speaker recognition
    • Deep Neural Network Embeddings for Text-Independent Speaker Verification
    • ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
  • r-vector: But system description to voxceleb speaker recognition challenge 2019
  • xi-vector: Xi-Vector Embedding for Speaker Recognition
  • Transformer based:
    • Self Multi-Head Attention for Speaker Recognition
    • LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
  • SPEAKER RECOGNITION FROM RAW WAVEFORM WITH SINCNET
  • Self-supervised:
    • Self-supervised speaker embeddings
    • Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics
  • PLDA: Probabilistic Linear Discriminant Analysis for Inferences About Identity
  • Reshape Dimensions Network for Speaker Recognition
  • Guided Speaker Embedding

Aggregation Layers

implementation: wespeaker/models/pooling_layers

  • Temporal Average Pooling (TAP)
  • Temporal Statistics Pooling (TSTP): X-vectors: Robust dnn embeddings for speaker recognition
  • Attentive Statistics Pooling (ASP): ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

Datasets

  • Voxceleb1
  • Voxceleb2

Challenge

Voice Conversion

Non-parallel:

  • CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks

Voice Anonymization

  • MODELING PSEUDO-SPEAKER UNCERTAINTY IN VOICE ANONYMIZATION

Target Speaker Extraction

  • INVESTIGATION OF SPEAKER REPRESENTATION FOR TARGET-SPEAKER SPEECH PROCESSING

Speaker Diarization

Spoofing

  • Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing

Targer Speaker ASR

Personalized VAD

Others

  • Emerging Properties in Self-Supervised Vision Transformers

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published