#计算机科学#Faster Whisper transcription with CTranslate2
#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4,...
Multimodal Learning Method MLA for CVPR 2024
[CVPR 2024] Deformable Convolution v4
Zero-Shot Speech Editing and Text-to-Speech in the Wild
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
#计算机科学#🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning
Referring Video Object Segmentation / Multi-Object Tracking Repo
A Python library for computing the Mel-Cepstral Distance (Mel-Cepstral Distortion, MCD) between two inputs. This implementation is based on the method proposed by Robert F. Kubichek in "Mel-Cepstral D...
#计算机科学#NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
0 条讨论