#计算机科学#A PyTorch-based Speech Toolkit
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
#自然语言处理#Reading list for research topics in multimodal machine learning
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
#自然语言处理#Foundation Architecture for (M)LLMs
#计算机科学#Multilingual Automatic Speech Recognition with word-level timestamps and confidence
WaveNet vocoder
#计算机科学#PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
AI powered speech denoising and enhancement
#Awesome#A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
#计算机科学#Controllable and fast Text-to-Speech for over 7000 languages!
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
#计算机科学#SincNet is a neural architecture for efficiently processing raw audio samples.
General Speech Restoration
#数据仓库#Open source audio annotation tool for humans
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Speech, Language, Audio, Music Processing with Large Language Model
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection