集合主题趋势排行榜

asr

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text Whisper

Python 17.78 k

3 个月前

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation speaker-recognition asr tts generative-ai multimodal 深度学习 neural-networks speaker-diariazation speech-translation speech-synthesis large-language-models

Python 15.72 k

6 小时前

alphacep / vosk-api

#安卓#Vosk 是一个离线的语言识别工具。支持 Python, Java, Node.JS, C#, C++ ，能识别20+种语言，包括中文、英语、法语等。

Jupyter Notebook 13.22 k

8 天前

PaddlePaddle / PaddleSpeech

PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，典型的应用包括：语音识别、语音翻译、语音合成等

transformer conformer speech-translation streaming-asr speech-alignment punctuation-restoration streaming-tts speech-synthesis tts asr speech-recognition 声音克隆 vocoder voice-recognition self-supervised-learning Whisper

Python 12.23 k

5 天前

k2-fsa / sherpa-onnx

#安卓#Sherpa-ONNX 是一个轻量级语音识别框架，基于 Kaldi 和 onnxruntime，无需联网即可实现语音转文本、文本转语音、说话人分离以及语音活动检测(VAD)。支持嵌入式系统、安卓、iOS、鸿蒙系统、树莓派、RISC-V、x86_64 服务器、WebSocket 服务器 / 客户端，以及 C/C++、Python、Kotlin、C#、Go、NodeJS、Java、Swift、Dart、JavaScript、Flutter、Object Pascal、Lazarus、Rust 等编程语言。

asr onnx Windows Linux macOS C++Android iOS 树莓派 aarch64 arm32 C#.NET mfc speech-to-text text-to-speech vits RISC-V lazarus object-pascal

C++ 7.45 k

3 小时前

wzpan / wukong-robot

#大语言模型#🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

人工智能 speaker asr tts unit Home Assistant raspeberry-pi amazon-echo alexa snowboy google-home anyq muse bci ChatGPT gpt3 openai

Python 6.97 k

1 年前

FunAudioLLM / SenseVoice

#大语言模型#Multilingual Voice Understanding Model

人工智能 asr gpt-4o speech-recognition speech-to-text aigc cross-lingual 大语言模型 Python PyTorch multilingual

Python 6.63 k

1 个月前

jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless b...

youtube-api subtitles YouTube transcripts youtube-subtitles youtube-transcripts Python transcript subtitle 命令行界面 captions youtube-captions youtube-transcript translating-transcripts asr youtube-asr

Python 6.18 k

8 天前

snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark PyTorch colab onnx text-to-speech speech speech-synthesis tts

Jupyter Notebook 5.49 k

2 年前

xiangyuecn / Recorder

html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式，支持pc和Android、iOS部分浏览器、Hybrid App（提供Android iOS App源码）、微信，提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码

recorder record JavaScript HTML h5 luyin mp3 wav amr ogg webm WebRTC audio recording asr

JavaScript 5.42 k

6 个月前

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text Whisper

Jupyter Notebook 4.96 k

1 个月前

wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

e2e-models PyTorch asr transformer conformer production-ready automatic-speech-recognition speech-recognition Whisper

Python 4.8 k

8 天前

PeterH0323 / Streamer-Sales

#大语言模型#Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁，一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、F...

chat-application internlm2 大语言模型聊天机器人 text-generation chat ChatGPT gpt rag tts asr digital-human

Python 3.48 k

6 个月前