vqa · GitHub Topics

#计算机科学#A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

PyTorch vqa pretrained-models multimodal 深度学习 captioning dialog textvqa hateful-memes multi-tasking

Python 5.58 k

3 个月前

#大语言模型#InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...

ChatGPT foundation-model gpt gpt-4 gradio husky image-captioning langchain 大语言模型 multimodal vqa llama vicuna video-generation sam segment-anything click draggan

Python 3.22 k

1 年前

open-compass / VLMEvalKit

#大语言模型#Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

gpt-4v large-language-models llava multi-modal openai vqa 大语言模型 openai-api qwen gpt 机器视觉 PyTorch gpt4 ChatGPT clip vit evaluation claude gemini

Python 2.82 k

1 天前

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision transformers vision-and-language vqa qwen2-vl

Python 2.6 k

3 天前

BDBC-KG-NLP / QA-Survey-CN

#自然语言处理#北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答（KBQA），基于文本的问答系统（TextQA），基于表格的问答系统（TableQA）、基于视觉的问答系统（VisualQA）和机器阅读理解（MRC）等，每类任务分别对学术界和工业界进行了相关总结。

survey 自然语言处理 question-answering kbqa vqa qa

1.79 k

2 年前

peteanderson80 / bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

vqa visual-question-answering faster-rcnn caffe image-captioning mscoco

Jupyter Notebook 1.45 k

2 年前

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioning language-model multi-modal-learning multi-task-learning vision-language-model vision-and-language vqa

Python 1.31 k

2 年前

microsoft / Oscar

Oscar and VinVL

vision-and-language pre-training image-captioning vqa oscar

Python 1.05 k

2 年前

hila-chefer / Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-bas...

transformers transformer vqa detr 可视化 explainability explainable-ai interpretability clip

Jupyter Notebook 859

2 年前

hengyuan-hu / bottom-up-attention-vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

vqa PyTorch

Python 757

1 年前

Cadene / vqa.pytorch

#计算机科学#Visual Question Answering in Pytorch

vqa 深度学习 resnet PyTorch coco torch

Python 731

6 年前

jayleicn / ClipBERT

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

PyTorch video-question-answering vqa vision-and-language cvpr2021

Python 722

2 年前

jokieleung / awesome-visual-question-answering

#Awesome#A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Awesome Lists vqa multi-modal multi-modal-learning

665

2 年前

OpenGVLab / Multi-Modality-Arena

#大语言模型#Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...

chat 聊天机器人 ChatGPT gradio large-language-models 大语言模型 vqa multi-modality vision-language-model

Python 531

1 年前