multi-modal-learning · GitHub Topics

mlfoundations / open_clip

#计算机科学#An open source implementation of CLIP.

深度学习 PyTorch 机器视觉 language-model multi-modal-learning contrastive-loss zero-shot-classification pretrained-models

Python 12.29 k

8 天前

OFA-Sys / Chinese-CLIP

#自然语言处理#本项目为CLIP模型的中文版本，使用大规模中文数据进行训练（~2亿图文对），旨在帮助用户快速实现中文领域的图文特征&相似度计算、跨模态检索、零样本图片分类等任务

中文机器视觉 multi-modal-learning 自然语言处理 PyTorch vision-and-language-pre-training image-text-retrieval clip pretrained-models vision-language 深度学习 multi-modal contrastive-loss transformers coreml-models

Python 5.43 k

1 年前

lyuchenyang / Macaw-LLM

#自然语言处理#Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

language-model multi-modal-learning 自然语言处理深度学习机器学习 neural-networks

Python 1.58 k

7 个月前

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioning language-model multi-modal-learning multi-task-learning vision-language-model vision-and-language vqa

Python 1.31 k

2 年前

lucidrains / x-clip

#计算机科学#A concise but complete implementation of CLIP with various experimental improvements from recent papers

人工智能深度学习 contrastive-learning zero-shot-learning multi-modal-learning

Python 711

2 年前

jokieleung / awesome-visual-question-answering

#Awesome#A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Awesome Lists vqa multi-modal multi-modal-learning

665

2 年前

InternRobotics / EmbodiedScan

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

3d-vision 机器视觉 multi-modal-learning Robotics

Python 612

2 个月前

kyegomez / zeta

#计算机科学#Build high-performance AI models with modular building blocks

人工智能 multi-modal transformers 深度学习 gpt4 llama2 multi-agent-systems multi-modal-learning multi-platform PyTorch speech-recognition transformer

Python 538

3 天前

DmitryRyumin / CVPR-2023-24-Papers

#人脸识别#CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included...

action-recognition autonomous-driving biometrics 机器视觉 cvpr cvpr2023 数据集深度学习 face-recognition gesture-recognition image-synthesis medical-image-processing multi-modal-learning pattern-recognition segmentation self-supervised-learning video-synthesis cvpr2024

Python 451

1 年前

zjukg / KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

cross-modal-retrieval Entity resolution image-classification image-generation information-extraction knowledge-graph knowledge-graph-embeddings large-language-models multi-modal-learning paper-list survey surveys visual-question-answering awsome

432

8 个月前

zhengli97 / PromptKD

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

cvpr2024 multi-modal-learning prompt-learning vision-language-model knowledge-distillation clip

Python 322

1 天前

Ysz2022 / NeRCo

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

neural-representation multi-modal-learning iccv iccv2023

Python 248

1 年前

moabarar / nemar

#计算机科学#[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

multimodal image-to-image-translation multi-modal multi-modal-learning affine-transformation 深度学习 cnn PyTorch image-registration cvpr2020

Python 186

5 年前

huggingface / chug

#数据仓库#Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

机器视觉数据集 distributed-training document-understanding multi-modal-learning pdf-document

Python 158

1 年前

GuanRunwei / Achelous

The official repository of Achelous and Achelous++

multi-modal-learning multi-task-learning object-detection object-tracking point-cloud-segmentation semantic-segmentation

Python 155

1 年前

qizekun / ReCon

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Point cloud multi-modal-learning representation-learning self-supervised-learning

Python 145

1 年前

kkakkkka / ETRIS

#计算机科学#[ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

深度学习深度神经网络机器学习 multi-modal-learning segmentation

Python 134

1 个月前

wjun0830 / CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

机器视觉 detr multi-modal-learning PyTorch video-understanding

Python 134

1 年前

shikras / d-cube

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

multi-modal-learning object-detection referring-expression-comprehension vision-language dataset open-vocabulary-detection

Python 128

1 年前

924973292 / EDITOR

【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

cvpr2024 multi-modal-learning person-reid reid multi-modal

Python 108

9 个月前