#计算机科学#A PyTorch-based Speech Toolkit
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
#自然语言处理#Reading list for research topics in multimodal machine learning
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
#自然语言处理#Foundation Architecture for (M)LLMs
#计算机科学#Multilingual Automatic Speech Recognition with word-level timestamps and confidence
WaveNet vocoder
#计算机科学#PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
AI powered speech denoising and enhancement
#Awesome#A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
#计算机科学#Controllable and fast Text-to-Speech for over 7000 languages!
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
General Speech Restoration
#计算机科学#SincNet is a neural architecture for efficiently processing raw audio samples.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
#数据仓库#Open source audio annotation tool for humans
Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight
Speech, Language, Audio, Music Processing with Large Language Model
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection