#大语言模型#:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, tr...
#大语言模型#Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, ...
#计算机科学#YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
AudioLDM: Generate speech, sound effects, music and beyond, with text.
A single Gradio + React WebUI with extensions for ACE-Step, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice, ParlerTTS, Stable Audio, MMS, StyleTTS2, MAGNet, AudioGen, Mus...
#计算机科学#Audio generation using diffusion models, in PyTorch.
#计算机科学#A timeline of the latest AI models for audio generation, starting in 2023!
#计算机科学#Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
A family of diffusion models for text-to-audio generation.
InspireMusic: A toolkit designed for music, song, and audio generation
Official PyTorch implementation of BigVGAN (ICLR 2023)
#数据仓库#AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio app...
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale te...
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
#计算机科学#Audio Development Tools (ADT) is a project for advancing sound, speech, and music technologies, featuring components for machine learning, sound synthesis, speech and music generation, signal processi...
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)