#

vision-language-pretraining

https://static.github-zh.com/github_avatars/deepseek-ai?size=40
Python 17.54 k
7 个月前
https://static.github-zh.com/github_avatars/deepseek-ai?size=40

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 3.96 k
1 年前
https://static.github-zh.com/github_avatars/DAMO-NLP-SG?size=40
Python 3.06 k
1 年前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for ...

Python 1.43 k
1 个月前
https://static.github-zh.com/github_avatars/Sense-GVT?size=40

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Python 666
3 年前
https://static.github-zh.com/github_avatars/TXH-mercury?size=40

[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Python 300
9 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Python 286
1 个月前
https://static.github-zh.com/github_avatars/sail-sg?size=40

[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》

Python 152
2 年前
https://static.github-zh.com/github_avatars/jusiro?size=40

[MedIA'25] FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.

Python 142
3 个月前
https://static.github-zh.com/github_avatars/BridgeVLA?size=40
Python 128
3 个月前
https://static.github-zh.com/github_avatars/vgthengane?size=40

Official repository for "CLIP model is an Efficient Continual Learner".

Python 99
3 年前
https://static.github-zh.com/github_avatars/ArrowLuo?size=40

PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"

Python 93
2 年前
https://static.github-zh.com/github_avatars/marslanm?size=40

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl....

80
3 个月前
https://static.github-zh.com/github_avatars/Zoky-2020?size=40

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]

Python 64
2 年前
https://static.github-zh.com/github_avatars/megvii-research?size=40

📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)

Python 53
2 年前
https://static.github-zh.com/github_avatars/TXH-mercury?size=40

[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Python 43
9 个月前
loading...
Website
Wikipedia