KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech
2022-08-18
否
2023-12-28T06:21:05Z
#大语言模型#AgentScope: Agent-Oriented Programming for Building LLM Applications
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Enjoy the magic of Diffusion models!
#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4,...
Open-Sora: 完全开源的高效复现类Sora视频生成方案
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
[CVPR 2024] Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework.
#大语言模型#利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
UT-Sarulab MOS prediction system using SSL models
[ECCV 2024] Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
0 条讨论