GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

visual-question-answering

Website
Wikipedia
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

vision-languagevision-and-language-pre-trainingimage-text-retrievalimage-captioningvisual-question-answeringvision-language-transformer
Jupyter Notebook 5.31 k
10 个月前
https://static.github-zh.com/github_avatars/OFA-Sys?size=40
OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

multimodalpretrainingimage-captioningtext-to-image-synthesisvisual-question-answeringreferring-expression-comprehensionvision-languagepretrained-modelspromptprompt-tuning中文
Python 2.5 k
1 年前
https://static.github-zh.com/github_avatars/peteanderson80?size=40
peteanderson80 / bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

vqavisual-question-answeringfaster-rcnncaffeimage-captioningmscoco
Jupyter Notebook 1.45 k
2 年前
https://static.github-zh.com/github_avatars/lucidrains?size=40
lucidrains / flamingo-pytorch

#计算机科学#Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

人工智能attention-mechanism深度学习transformersvisual-question-answering
Python 1.24 k
3 年前
https://static.github-zh.com/github_avatars/YehLi?size=40
YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

image-captioningvideo-captioningvision-and-languagepretrainingcross-modal-retrievalvisual-question-answeringtden
Python 969
2 年前
https://static.github-zh.com/github_avatars/richard-peng-xia?size=40
richard-peng-xia / awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

Medical imagingmultimodal-deep-learningmultimodal-learningvisual-question-answeringlarge-language-modelslarge-multimodal-modelsmultimodal-large-language-models
760
11 天前
https://static.github-zh.com/github_avatars/jnhwkim?size=40
jnhwkim / ban-vqa

Bilinear attention networks for visual question answering

visual-question-answeringattentionpytorch-implmention
Python 545
2 年前
https://static.github-zh.com/github_avatars/MILVLG?size=40
MILVLG / mcan-vqa

Deep Modular Co-Attention Networks for Visual Question Answering

visual-question-answeringattention
Python 452
4 年前
https://static.github-zh.com/github_avatars/MMMU-Benchmark?size=40
MMMU-Benchmark / MMMU

#自然语言处理#This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

机器视觉深度学习深度神经网络evaluationfoundation-modelslarge-language-modelslarge-multimodal-models大语言模型机器学习multimodalmultimodal-deep-learningmultimodal-learningmultimodality自然语言处理question-answeringSTEMvisual-question-answering
Python 440
1 个月前
https://static.github-zh.com/github_avatars/zjukg?size=40
zjukg / KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

cross-modal-retrievalEntity resolutionimage-classificationimage-generationinformation-extractionknowledge-graphknowledge-graph-embeddingslarge-language-modelsmulti-modal-learningpaper-listsurveysurveysvisual-question-answeringawsome
425
6 个月前
https://static.github-zh.com/github_avatars/davidmascharka?size=40
davidmascharka / tbd-nets

#计算机科学#PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

机器学习PyTorch可视化深度学习visual-question-answeringvqaneural-networks
Jupyter Notebook 348
4 年前
https://static.github-zh.com/github_avatars/MILVLG?size=40
MILVLG / openvqa

#计算机科学#A lightweight, scalable, and general framework for visual question answering research

visual-question-answeringvqaPyTorch深度学习benchmark
Python 323
4 年前
https://static.github-zh.com/github_avatars/lupantech?size=40
lupantech / MathVista

#计算机科学#MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

large-language-models机器学习数学sciencevisual-question-answering
Jupyter Notebook 312
7 个月前
https://static.github-zh.com/github_avatars/MILVLG?size=40
MILVLG / prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

gpt-3multimodal-deep-learningprompt-engineeringPyTorchvisual-question-answering
Python 274
2 年前
https://static.github-zh.com/github_avatars/HanXinzi-AI?size=40
HanXinzi-AI / awesome-computer-vision-resources

#人脸识别#a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

机器视觉image-classificationimage-segmentationsemantic-segmentationMedical imagingOCRvisual-question-answeringimage-captioningsuper-resolutionGenerative Adversarial Networkface-detectionface-recognitionautonomous-vehiclesautonomous-drivingmodel-compressionTensorflowPyTorchpaddlepaddle
268
1 年前
https://static.github-zh.com/github_avatars/Cyanogenoid?size=40
Cyanogenoid / pytorch-vqa

Strong baseline for visual question answering

PyTorchvqavisual-question-answeringbaseline
Python 240
2 年前
https://static.github-zh.com/github_avatars/qiantianwen?size=40
qiantianwen / NuScenes-QA

[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.

autonomous-drivingvision-languagevisual-question-answering
Python 192
7 个月前
https://static.github-zh.com/github_avatars/MMStar-Benchmark?size=40
MMStar-Benchmark / MMStar

#大语言模型#[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluationlarge-language-modelslarge-multimodal-modelslarge-vision-language-modellarge-vision-language-models大语言模型multimodalmultimodal-learningmultimodalityvisual-question-answering
Python 181
9 个月前
https://static.github-zh.com/github_avatars/Yushi-Hu?size=40
Yushi-Hu / tifa

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

image-to-textlarge-language-modelstext-to-imagevisual-question-answering
Python 166
1 年前
https://static.github-zh.com/github_avatars/markdtw?size=40
markdtw / vqa-winner-cvprw-2017

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

PyTorchvisual-question-answering
Python 163
6 年前
loading...