speech-processing

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

PyTorch speech-processing speaker-diarization voice-activity-detection pretrained-models speaker-recognition speaker-verification

Jupyter Notebook 8.29 k

1 天前

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

voice-detection voice-recognition voice-commands PyTorch onnx voice-activity-detection voice-control onnx-runtime onnxruntime speech speech-processing vad

Python 6.85 k

21 天前

pliang279 / awesome-multimodal-ml

#自然语言处理#Reading list for research topics in multimodal machine learning

multimodal-learning 机器学习 representation-learning 自然语言处理机器视觉 speech-processing Robotics healthcare reading-list 深度学习 reinforcement-learning

6.61 k

1 年前

microsoft / torchscale

#自然语言处理#Foundation Architecture for (M)LLMs

机器视觉机器学习 multimodal 自然语言处理 pretrained-language-model speech-processing transformer translation

Python 3.11 k

1 年前

linto-ai / whisper-timestamped

#计算机科学#Multilingual Automatic Speech Recognition with word-level timestamps and confidence

深度学习 speech speech-recognition speech-to-text asr 机器学习 Python PyTorch attention-is-all-you-need attention-mechanism attention-model speaker-diarization speech-processing transformers Whisper

Python 2.59 k

7 天前

r9y9 / wavenet_vocoder

WaveNet vocoder

wavenet speech-synthesis speech-processing PyTorch Python neural-vocoder speech

Python 2.37 k

2 年前

r9y9 / deepvoice3_pytorch

#计算机科学#PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

tts speech-synthesis end-to-end speech-processing 机器学习 PyTorch Python multi-speaker

Python 1.98 k

2 年前

resemble-ai / resemble-enhance

AI powered speech denoising and enhancement

denoise speech-denoising speech-enhancement speech-processing

Python 1.97 k

9 个月前

wq2012 / awesome-diarization

#Awesome#A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

speaker-diarization Awesome Lists 机器学习 speech-recognition speech-processing 深度学习

1.8 k

2 个月前

DigitalPhonetics / IMS-Toucan

#计算机科学#Controllable and fast Text-to-Speech for over 7000 languages!

text-to-speech toolkit speech-synthesis 深度学习 speech-processing tts PyTorch speech

Python 1.64 k

3 个月前

TEN-framework / ten-vad

Voice Activity Detection (VAD) : low-latency, high-performance and lightweight

conversational-ai real-time speech-processing vad voice-activity-detection voice-commands voice-recognition audio automatic-speech-recognition speech silero-vad

C 1.42 k

14 天前

coqui-ai / open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

tts stt speech-to-text text-to-speech speech-recognition speech-synthesis speech-processing voice-recognition voice-activity-detection 声音克隆 speech-separation

1.36 k

1 年前

haoheliu / voicefixer

General Speech Restoration

speech-processing speech-synthesis speech-enhancement speech-analysis speech tts denoise super-resolution vocoder

Python 1.21 k

7 个月前

mravanelli / SincNet

#计算机科学#SincNet is a neural architecture for efficiently processing raw audio samples.

Python 1.2 k

4 年前

ictnlp / StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

seamless speech speech-recognition speech-synthesis speech-to-text speech-translation translation all-in-one machine-translation streaming-audio text-to-speech asr tts voice text-to-audio non-autoregressive speech-enhancement audio-processing speech-processing

Python 1.15 k

3 个月前

midas-research / audino

#数据仓库#Open source audio annotation tool for humans

audio-processing speech-processing 机器学习 annotation-tool audio-annotation Python 数据集

JavaScript 1.11 k

7 个月前

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

audio-processing 大语言模型 multimodal-large-language-models peft speech-processing

Python 890

11 天前

Ryuk17 / SpeechAlgorithms

You can find the speech algorithms you want here

speech-processing

C 830

2 个月前

nyrahealth / CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

asr audio detection recognition speech speech-recognition transcription Whisper speech-processing

Python 814

3 个月前

Website
Wikipedia