#计算机科学#StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Self-Supervised Speech Pre-training and Representation Learning Toolkit
翻译 - 自我监督的语音预训练和表征学习工具包。
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
#计算机科学#A low-bitrate single-codebook 16 kHz speech codec based on focal modulation
This repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and trai...
A neural speech codec based on discrete WavLM representations
A collections of audio codecs with a standardized API
#计算机科学#This repository contain the code of the main part of my master thesis degree at Politecnico di Torino in Data science & Engineering
In this repository, the wavLM model is used for quality and poor quality data for speaker verification task, and the PyCM library is used for evaluation.
SOTA method for self-supervised speaker verification leveraging a large-scale pretrained ASR model.
This repository combines `WavLM`, a powerful speech representation model from Microsoft, with `MSDD` (Multi-Scale Diarization Decoder), a state-of-the-art approach for speaker diarization from Nvidia...
Universal Pooling Method for Speaker Verification Utilizing Pre-trained Multi-layer Features, 2025 preprint
This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"
WavLM Large + RawNetX Speaker Verification Base: End-to-End Speaker Verification Architecture
Acoustic Transformer Models for Audio Classification
CryCeleb2023 experiments
kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization