speech-synthesis · GitHub Topics

#计算机科学#🐸💬 - 一个深度学习的 TTS 语言合成库

Python text-to-speech 深度学习 speech PyTorch tts vocoder tacotron glow-tts melgan speaker-encoder hifigan speaker-encodings multi-speaker-tts tts-model speech-synthesis 声音克隆 voice-synthesis voice-conversion

Python 41.73 k

1 年前

leon-ai / leon

🧠 Leon is your open-source personal assistant.

leon personal-assistant Node.js Python 人工智能 speech-to-text text-to-speech speech-recognition speech-synthesis flite assistant virtual-assistant 聊天机器人 Bot voice-assistant 自动化 offline 隐私 ai-assistant

TypeScript 16.52 k

19 小时前

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation speaker-recognition asr tts generative-ai multimodal 深度学习 neural-networks speaker-diariazation speech-translation speech-synthesis large-language-models

Python 15.26 k

4 小时前

NVIDIA / DeepLearningExamples

#自然语言处理#深度学习示例

机器视觉深度学习 drug-discovery forecasting large-language-models mxnet paddlepaddle PyTorch recommender-systems speech-recognition speech-synthesis Tensorflow tensorflow2 translation 自然语言处理

Jupyter Notebook 14.42 k

1 年前

PaddlePaddle / PaddleSpeech

PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，典型的应用包括：语音识别、语音翻译、语音合成等

transformer conformer speech-translation streaming-asr speech-alignment punctuation-restoration streaming-tts speech-synthesis tts asr speech-recognition 声音克隆 vocoder voice-recognition self-supervised-learning Whisper

Python 12.11 k

9 天前

rhasspy / piper

A fast, local neural text to speech system

speech-synthesis text-to-speech tts

C++ 9.71 k

21 天前

espnet / espnet

#计算机科学#End-to-End Speech Processing Toolkit

深度学习 end-to-end chainer PyTorch kaldi speech-recognition speech-synthesis speech-translation machine-translation voice-conversion speech-enhancement speech-separation singing-voice-synthesis speaker-diarization text-to-speech

Python 9.34 k

2 天前

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...

audio-generation audio-synthesis audioldm music-generation naturalspeech2 singing-voice-conversion speech-synthesis text-to-audio text-to-speech vall-e voice-conversion audit fastspeech2 vits emilia maskgct vocoder

Python 9.27 k

2 个月前

voicepaw / so-vits-svc-fork

#计算机科学#基于 so-vits-svc4.0(V1)的一个分支，支持实时推理和图形化推理界面，且兼容其模型。

sovits vits voice-conversion so-vits-svc hubert softvc realtime voice-changer 深度学习 PyTorch speech-synthesis Generative Adversarial Network lightning pytorch-lightning Hacktoberfest

Python 9.08 k

3 天前

rany2 / edge-tts

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

tts speech-synthesis text-to-speech

Python 8.74 k

3 个月前

netease-youdao / EmotiVoice

#计算机科学#EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

PyTorch speech speech-synthesis tts multi-speaker text-to-speech 深度学习 prompt emotivoice 人工智能 Python emotion style

Python 8.11 k

1 年前

jaywalnut310 / vits

#计算机科学#VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

tts text-to-speech PyTorch 深度学习 speech-synthesis

Python 7.58 k

2 年前

yl4579 / StyleTTS2

#计算机科学#StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

深度学习 PyTorch speaker-adaptation speech-synthesis text-to-speech tts wavlm diffusion-models latent-diffusion latent-diffusion-models Generative Adversarial Network

Python 5.87 k

1 年前

snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark PyTorch colab onnx text-to-speech speech speech-synthesis tts

Jupyter Notebook 5.42 k

2 年前

espeak-ng / espeak-ng

#安卓#eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

espeak-ng espeak Android text-to-speech speech-synthesis

C 5.34 k

19 天前

MoonInTheRiver / DiffSinger

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

text-to-speech diffusion-speedup tts aaai2022 singing-synthesis diffusion-model speech-synthesis singing-voice-synthesis singing-voice singing-voice-database MIDI

Python 4.55 k

4 个月前

abus-aikorea / voice-pro

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isola...