#

visual-question-answering

https://static.github-zh.com/github_avatars/salesforce?size=40

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Jupyter Notebook 5.48 k
1 年前
https://static.github-zh.com/github_avatars/OFA-Sys?size=40

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Python 2.53 k
1 年前
https://static.github-zh.com/github_avatars/peteanderson80?size=40

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Jupyter Notebook 1.45 k
3 年前
https://static.github-zh.com/github_avatars/lucidrains?size=40

#计算机科学#Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Python 1.26 k
3 年前
https://static.github-zh.com/github_avatars/YehLi?size=40

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

Python 968
3 年前
https://static.github-zh.com/github_avatars/jnhwkim?size=40

Bilinear attention networks for visual question answering

Python 545
2 年前
https://static.github-zh.com/github_avatars/MILVLG?size=40

Deep Modular Co-Attention Networks for Visual Question Answering

Python 455
5 年前
https://static.github-zh.com/github_avatars/davidmascharka?size=40

#计算机科学#PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Jupyter Notebook 348
4 年前
https://static.github-zh.com/github_avatars/lupantech?size=40

#计算机科学#MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

Jupyter Notebook 331
10 个月前
https://static.github-zh.com/github_avatars/MILVLG?size=40

#计算机科学#A lightweight, scalable, and general framework for visual question answering research

Python 327
4 年前
https://static.github-zh.com/github_avatars/MILVLG?size=40

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Python 277
3 个月前
https://static.github-zh.com/github_avatars/Cyanogenoid?size=40
Python 241
3 年前
https://static.github-zh.com/github_avatars/qiantianwen?size=40

[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.

Python 207
10 个月前
https://static.github-zh.com/github_avatars/Yushi-Hu?size=40

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Python 173
1 年前
https://static.github-zh.com/github_avatars/markdtw?size=40

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Python 163
7 年前
loading...
Website
Wikipedia