text-to-audio · GitHub Topics

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...

audio-generation audio-synthesis audioldm music-generation naturalspeech2 singing-voice-conversion speech-synthesis text-to-audio text-to-speech vall-e voice-conversion audit fastspeech2 vits emilia maskgct vocoder

Python 9.38 k

4 个月前

denizsafak / abogen

Generate audiobooks from EPUBs, PDFs and text with synchronized captions.

audiobook audiobooks content-creation content-creator epub-converter kokoro media-generation narrator speech-synthesis subtitles text-to-audio text-to-speech tts voice-synthesis kokoro-82m kokoro-tts

Python 3.54 k

21 天前

hkchengrex / MMAudio

#计算机科学#[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

audio audio-synthesis 机器视觉深度学习 text-to-audio

Python 1.85 k

1 个月前

gitmylo / audio-webui

A webui for different audio related Neural Networks

人工智能 audioldm bark rvc text-to-audio text-to-speech 声音克隆 audiocraft music generative-music tts aio all-in-one

Python 1.2 k

4 个月前

declare-lab / tango

A family of diffusion models for text-to-audio generation.

audio-generation diffusion diffusion-models language-models large-language-models text-to-audio

Python 1.19 k

2 个月前

ictnlp / StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

seamless speech speech-recognition speech-synthesis speech-to-text speech-translation translation all-in-one machine-translation streaming-audio text-to-speech asr tts voice text-to-audio non-autoregressive speech-enhancement audio-processing speech-processing

Python 1.15 k

3 个月前

Tencent-Hunyuan / HunyuanVideo-Foley

HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.

text-to-audio text-to-video

Python 869

6 天前

declare-lab / TangoFlux

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching

generative-ai text-to-audio

Jupyter Notebook 781

2 个月前

Text-to-Audio / Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

diffusion-models latent-diffusion latent-space text-to-audio

Python 654

1 年前

ivcylc / OpenMusic

OpenMusic: SOTA Text-to-music (TTM) Generation

人工智能 diffusion-models music-generation text-to-audio ai-music audioldm diffusion-transformer dit hifi-gan vall-e

Python 609

3 个月前

lucidrains / nuwa-pytorch

#计算机科学#Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

人工智能深度学习 transformers attention-mechanism text-to-video text-to-audio

Python 549

3 年前

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

aigc large-language-models large-vision-language-models multimodal-generation multimodal-large-language-models multimodal-models multimodality text-to-3d text-to-audio text-to-image text-to-speech text-to-video 大语言模型 mllm

HTML 509

5 个月前

AMAAI-Lab / mustango

Mustango: Toward Controllable Text-to-Music Generation

diffusion-models large-language-models text-to-audio

Python 374

3 个月前

haidog-yaqub / EzAudio

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

diffusion-models generative-ai text-to-audio

Python 308

2 个月前

TencentARC / AudioStory

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

audio-generation diffusion-models multimodal-large-language-models video-dubbing text-to-audio

Jupyter Notebook 270

12 天前

happylittlecat2333 / Auffusion

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

audio-generation diffusion diffusion-models large-language-models text-to-audio

Jupyter Notebook 187

1 年前

ilaria-manco / word2wave

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

text-to-audio audio-generation music-generation ai-music

Python 119

4 年前

bnsantoso / sub-to-audio

Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.

text-to-audio text-to-speech Python tts audio-processing

Python 117

2 年前

sony / soundctm

Pytorch implementation of SoundCTM

audio-generation diffusion-models PyTorch text-to-audio

Python 96

5 个月前

keonlee9420 / WaveGrad2

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

text-to-speech neural-tts audio synthesis non-autoregressive score-matching duration robust PyTorch tts speech-synthesis text-to-audio end-to-end

Python 69

4 年前