GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

audio-visual-speech-recognition

Website
Wikipedia
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

conformerPyTorchspeech-recognitionparaformerpunctuationspeaker-diarizationrnntaudio-visual-speech-recognitionpretrained-modelvoice-activity-detectionWhisperdfsmnvadspeechgptspeechllm
Python 11.02 k
19 天前
https://static.github-zh.com/github_avatars/smeetrs?size=40
smeetrs / deep_avsr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

audio-visual-speech-recognitionspeech-recognitionautomatic-speech-recognitionspeech-to-text
Python 233
1 年前
https://static.github-zh.com/github_avatars/ankurbhatia24?size=40
ankurbhatia24 / MULTIMODAL-EMOTION-RECOGNITION

#计算机科学#Human Emotion Understanding using multimodal dataset.

深度学习机器学习Kerasaudio-visual-speech-recognitionPythonTensorflowlibrosa
Jupyter Notebook 98
5 年前
https://static.github-zh.com/github_avatars/georgesterpu?size=40
georgesterpu / Taris

#计算机科学#Transformer-based online speech recognition system with TensorFlow 2

onlinespeech-recognitionaudio-visual-speech-recognitionmultimodalmultimodal-deep-learningtransformerTensorflowtensorflow2Pythonmahcine-learning深度学习
Python 26
4 年前
https://static.github-zh.com/github_avatars/umbertocappellazzo?size=40
umbertocappellazzo / Llama-AVSR

[ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".

audio-visual-speech-recognitionlarge-language-models
Python 22
3 个月前
https://static.github-zh.com/github_avatars/Sreyan88?size=40
Sreyan88 / LipGER

#大语言模型#Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

audio-visual-speech-recognitiongenerative-ai大语言模型promptingspeech-recognition
Python 17
1 年前
https://static.github-zh.com/github_avatars/david-gimeno?size=40
david-gimeno / tailored-avsr

Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"

audio-visual-speech-recognitioninterpretability
Python 12
4 个月前
https://static.github-zh.com/github_avatars/sungnyun?size=40
sungnyun / avsr-temporal-dynamics

(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition

audio-visual-speech-recognition
Python 12
8 个月前
https://static.github-zh.com/github_avatars/sungnyun?size=40
sungnyun / cav2vec

(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation

audio-visual-speech-recognitionself-supervised-learning
Python 10
2 个月前
https://static.github-zh.com/github_avatars/lzuwei?size=40
lzuwei / end-to-end-multiview-lipreading

#计算机科学#End to End Multiview Lip Reading

audio-visual-speech-recognition深度学习end-to-end-learning
Python 10
7 年前
https://static.github-zh.com/github_avatars/hmeutzner?size=40
hmeutzner / kaldi-avsr

Kaldi-based audio-visual speech recognition

speech-recognitionkaldi深度神经网络asraudio-visual-speech-recognition
Shell 6
3 年前
https://static.github-zh.com/github_avatars/aidayang?size=40
aidayang / FunASR-OneClick

FunASR实时语音识别版,识别麦克风和电脑内播放的声音,电脑语音打字软件

audio-visual-speech-recognitionconformerdfsmnparaformerpretrained-modelspunctuationPyTorchrnntspeaker-diarizationspeech-recognitionspeechgptspeechllmvadvoice-activity-detectionWhisper
5
15 天前
https://static.github-zh.com/github_avatars/karlsimsBBC?size=40
karlsimsBBC / cassette-bot

🤖 📼 Command-line tool for remixing videos with time-coded transcriptions.

text-to-videoaudio-visual-speech-recognitionVideo
Python 5
6 年前
https://static.github-zh.com/github_avatars/zulfiqar-ali01?size=40
zulfiqar-ali01 / audio-visual-Transcription

Real-Time Audio-visual Speech Recongition

audio-processingaudio-visual-speech-recognition
Python 4
10 个月前
https://static.github-zh.com/github_avatars/luomingshuang?size=40
luomingshuang / lipreading_with_icefall

In this repository, I try to use k2, icefall and Lhotse for lip reading. I will modify it for the lip reading task. Many different lip-reading datasets should be added. -_-

audio-visual-speech-recognition
Python 2
3 年前
https://static.github-zh.com/github_avatars/Remi-Gau?size=40
Remi-Gau / McGurk_prior_code

Code related to the fMRI experiment on the contextual modulation of the McGurk Effect

fmriaudio-visual-speech-recognition
MATLAB 1
3 年前
https://static.github-zh.com/github_avatars/MaazKhan98?size=40
MaazKhan98 / Multimodal-Emotion-Recognition-speech-facial-and-body-gestures

#计算机科学#Human Emotion Understanding using multimodal dataset

audio-visual-speech-recognition深度学习机器学习Tensorflow
Jupyter Notebook 0
3 年前