GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

video-question-answering

Website
Wikipedia
https://static.github-zh.com/github_avatars/OpenGVLab?size=40
OpenGVLab / Ask-Anything

#大语言模型#[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

captioning-videosChatGPTgradiolangchainvideo-question-answeringvideo-understandingstablelmchatVideobig-modelfoundation-modelslarge-language-models
Python 3.25 k
5 个月前
https://static.github-zh.com/github_avatars/OpenGVLab?size=40
OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

foundation-modelsvideo-understandingvision-transformeraction-recognitionmultimodaltemporal-action-localizationvideo-question-answeringzero-shot-classificationbenchmarkcontrastive-learningself-supervisedinstruction-tuningvideo-clip
Python 1.91 k
22 天前
https://static.github-zh.com/github_avatars/jayleicn?size=40
jayleicn / ClipBERT

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

PyTorchvideo-question-answeringvqavision-and-languagecvpr2021
Python 722
2 年前
https://static.github-zh.com/github_avatars/Vision-CAIR?size=40
Vision-CAIR / MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

video-question-answeringvideo-understanding
Python 621
6 个月前
https://static.github-zh.com/github_avatars/X-PLUG?size=40
X-PLUG / Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

benchmark中文datasetmllmmultimodalmultimodal-large-language-modelsmultimodal-pretrainingVideovideo-question-answeringyouku
Python 297
1 年前
https://static.github-zh.com/github_avatars/X-PLUG?size=40
X-PLUG / mPLUG-2

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

foundation-modelsmllmmultimodalmultimodal-pretrainingVideoimage-retrievalmplugvideo-question-answeringvqa
Python 227
2 年前
https://static.github-zh.com/github_avatars/apple?size=40
apple / ml-slowfast-llava

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

multimodal-large-language-modelsvideo-question-answering
Python 226
9 个月前
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / ALPRO

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

vision-and-languagevideo-question-answeringrepresentation-learningprompt-learning
Python 188
1 个月前
https://static.github-zh.com/github_avatars/Yui010206?size=40
Yui010206 / SeViLA

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

mllmvideo-question-answering
Python 185
1 年前
https://static.github-zh.com/github_avatars/antoyang?size=40
antoyang / FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

multimodal-learningvideo-understandingvqalarge-language-modelspre-trainingvideo-question-answeringvision-and-languagevisual-question-answering
Python 157
6 个月前
https://static.github-zh.com/github_avatars/doc-doc?size=40
doc-doc / NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

vision-languagevideo-question-answeringvideo-understanding
Python 156
1 年前
https://static.github-zh.com/github_avatars/tsujuifu?size=40
tsujuifu / pytorch_violet

A PyTorch implementation of VIOLET

PyTorchvision-and-languagepre-trainingvideo-question-answering
Python 137
1 年前
https://static.github-zh.com/github_avatars/bytedance?size=40
bytedance / Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmarkdatasetlarge-language-modelsvideo-language-pretrainingvideo-question-answeringvision-languagevideo-captioningresearch
Python 136
5 个月前
https://static.github-zh.com/github_avatars/jpthu17?size=40
jpthu17 / EMCL

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

cross-modal-retrievalneuripsvideo-captioningvideo-question-answering
Python 134
1 年前
https://static.github-zh.com/github_avatars/jayleicn?size=40
jayleicn / TVQAplus

[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering

video-question-answeringdatasetPyTorch
Python 129
3 年前
https://static.github-zh.com/github_avatars/antoyang?size=40
antoyang / just-ask

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

vqavisual-question-answeringvideo-question-answeringvideo-understandingvision-and-languagepre-trainingmultimodal-learning
Jupyter Notebook 121
2 年前
https://static.github-zh.com/github_avatars/jpthu17?size=40
jpthu17 / HBI

[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

cross-modal-retrievalcvprvideo-question-answering
Python 119
6 个月前
https://static.github-zh.com/github_avatars/mlvlab?size=40
mlvlab / Flipped-VQA

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

emnlp2023large-language-modelsmulti-modalvideo-question-answeringvisual-question-answering
Python 74
3 个月前
https://static.github-zh.com/github_avatars/doc-doc?size=40
doc-doc / NExT-GQA

Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)

video-question-answering
Python 72
1 年前
https://static.github-zh.com/github_avatars/bcmi?size=40
bcmi / Causal-VidQA

[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towa...

commonsense-reasoningvideo-question-answering
Python 68
3 个月前
loading...