speech · GitHub Topics

#计算机科学#🐸💬 - 一个深度学习的 TTS 语言合成库

Python text-to-speech 深度学习 speech PyTorch tts vocoder tacotron glow-tts melgan speaker-encoder hifigan speaker-encodings multi-speaker-tts tts-model speech-synthesis 声音克隆 voice-synthesis voice-conversion

Python 41.72 k

1 年前

babysor / MockingBird

#计算机科学#🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

人工智能 speech PyTorch 深度学习 text-to-speech tts

Python 36.5 k

8 个月前

svc-develop-team / so-vits-svc

#计算机科学#SoftVC VITS Singing Voice Conversion

人工智能 audio-analysis Generative Adversarial Network singing-voice-conversion so-vits-svc sovits variational-inference vc vits voice voice-conversion voiceconversion voice-changer flow 深度学习 PyTorch speech

Python 27.44 k

2 年前

huggingface / datasets

#自然语言处理#🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

自然语言处理数据集 PyTorch Tensorflow pandas NumPy 机器视觉机器学习深度学习 speech Hacktoberfest

Python 20.44 k

1 天前

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text Whisper

Python 17.05 k

1 个月前

IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

open-vocabulary-detection open-vocabulary-segmentation data-generation automatic-labeling-system caption speech image-editing

Jupyter Notebook 16.7 k

1 年前

kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

kaldi C++CUDA Shell speech-recognition speech-to-text speaker-verification speaker-id speech

Shell 15.01 k

8 天前

AIGC-Audio / AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

audio gpt music sound speech talking-head

Python 10.19 k

1 年前

mozilla / TTS

#计算机科学#:robot: 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

深度学习 text-to-speech Python PyTorch tacotron tts speaker-encoder dataset-analysis tacotron2 tensorflow2 vocoder melgan glow-tts speech

Jupyter Notebook 9.93 k

2 年前

modelscope / modelscope

#自然语言处理#ModelScope: bring the notion of Model-as-a-Service to life.

自然语言处理 cv speech multi-modal science 深度学习机器学习 Python

Python 8.16 k

5 天前

netease-youdao / EmotiVoice

#计算机科学#EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

PyTorch speech speech-synthesis tts multi-speaker text-to-speech 深度学习 prompt emotivoice 人工智能 Python emotion style

Python 8.11 k

1 年前

PaddlePaddle / models

#自然语言处理#Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.

paddlepaddle 深度学习神经网络机器视觉自然语言处理 recommendation speech cv models

Python 6.93 k

6 个月前

TalAter / annyang

💬 Speech recognition for your site

speech-recognition speech speech-to-text voice

JavaScript 6.66 k

1 年前

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

voice-detection voice-recognition voice-commands PyTorch onnx voice-activity-detection voice-control onnx-runtime onnxruntime speech speech-processing vad

Python 6.39 k

2 个月前

snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark PyTorch colab onnx text-to-speech speech speech-synthesis tts

Jupyter Notebook 5.41 k

2 年前

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text Whisper

Jupyter Notebook 4.78 k

7 天前

metavoiceio / metavoice-src

#计算机科学#Foundational model for human-like, expressive TTS

text-to-speech 人工智能深度学习 PyTorch speech speech-synthesis tts voice-clone zero-shot-tts

Python 4.14 k

1 年前

huggingface / speech-to-speech

#计算机科学#Speech To Speech: an effort for an open-sourced and modular GPT4-o

人工智能 assistant language-model 机器学习 Python speech speech-synthesis speech-to-text speech-translation

Python 4.12 k

3 个月前

fixie-ai / ultravox

#大语言模型#A fast multimodal LLM for real-time voice

人工智能大语言模型 slm speech

Python 4.11 k

24 天前

jianchang512 / stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

speech speech-recognition speech-to-text stt

Python 3.66 k

8 个月前