Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
#计算机科学#[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
A webui for different audio related Neural Networks
A family of diffusion models for text-to-audio generation.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Generate audiobooks from EPUBs, PDFs and text with synchronized captions.
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
OpenMusic: SOTA Text-to-music (TTM) Generation
#计算机科学#Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Mustango: Toward Controllable Text-to-Music Generation
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.
Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.
Pytorch implementation of SoundCTM
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
#大语言模型#Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusias...
#计算机科学#Creative Text-to-Audio Generation via Synthesizer Programming @ ICML'24