multimodal-datasets · GitHub Topics

#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence

深度学习 deep-learning-library image-captioning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering multimodal-datasets multimodal-deep-learning

Jupyter Notebook 10.89 k

10 个月前

remyxai / VQASynth

Compose multimodal datasets 🎹

dataset-generation multimodal-datasets multimodal-deep-learning synthetic-dataset-generation

Python 475

1 个月前

drmuskangarg / Multimodal-datasets

This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the information...

multimodal-datasets

305

4 年前

AnkurDeria / MFT

#计算机科学#Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.

深度学习 multimodal-datasets multimodal-deep-learning remote-sensing transformer-models

Jupyter Notebook 219

2 年前

wisdomikezogwo / quilt1m

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.

clip-model histopathology multimodal-datasets vlm

Python 168

2 年前

yuanxiaosc / Multimodal-short-video-dataset-and-baseline-classification-model

500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型（TensorFlow2.0）。

multimodal-datasets classification-model Tensorflow

Jupyter Notebook 131

6 年前

marslanm / Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl....

cross-modal multimodal-datasets multimodal-deep-learning multimodal-pre-trained-model transformer-models vision-language-pretraining

3 个月前

roboflow / rf100-vl

Code from the paper "Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models"

机器视觉 multimodal-datasets object-detection

Python 79

3 个月前

piresramon / gpt-4-enem

Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.

人工智能 llm-inference 大语言模型 multimodal-datasets

Python 50

9 个月前

Yuco-Z / Awesome-Multi-Modal-Dialog

#Awesome#[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics

Awesome Lists dialogue multimodal multimodal-deep-learning multimodal-datasets multimodal-learning

8 个月前

JunweiLiang / FVTA_MemexQA

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

visual-question-answering vision-and-language multimodal-deep-learning multimodal-datasets

Python 32

6 年前

ddw2AIGROUP2CQUPT / Large-Scale-Multimodal-Face-Datasets

Millions-Level Face/Human-Scene Image-Text Datasets

multimodal-datasets

3 个月前

OlehOnyshchak / pyWikiMM

Collects a multimodal dataset of Wikipedia articles and their images

wikipedia multimodal multimodality multimodal-datasets multimodal-learning 数据库 data-cleaning data-collection data-processing

Python 16

2 年前

deepmancer / vlm-toolbox

#计算机科学#Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation

clip 深度学习 deep-learning-library multimodal-datasets multimodal-deep-learning multimodal-learning prompt-tuning vision-and-language vision-framework vision-language-transformer zero-shot-classification PyTorch transformers

Jupyter Notebook 11

7 个月前

pspdada / SENTINEL

[ICCV 2025] Official repository of "Mitigating Object Hallucinations via Sentence-Level Early Intervention".

multimodal-datasets multimodal-large-language-models preference-alignment image-captioning

Python 11

2 个月前

lujiaying / MUG-Bench

Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields

multimodal-datasets multimodal-learning

Python 11

2 年前

NUSTM / EMDRC

#数据仓库#Towards Explainable Multimodal Depression Recognition for Clinical Interviews

mental-health dataset 数据集 affective-computing multimodal-datasets

8 个月前

clp-research / language-models-multimodal-tasks

Official Git repository for "Hakimov, S., and Schlangen, D., (2023). Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks. Findings of the Associati...

language-model multimodal-datasets multimodal-learning

Python 5

2 年前