GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

vision-language-pretraining

Website
Wikipedia
https://static.github-zh.com/github_avatars/deepseek-ai?size=40
deepseek-ai / Janus

#大语言模型#Janus-Series: Unified Multimodal Understanding and Generation Models

any-to-anyfoundation-models大语言模型multimodalvision-language-pretrainingunified-model
Python 17.36 k
4 个月前
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / LAVIS

#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence

深度学习deep-learning-libraryimage-captioningsalesforcevision-and-languagevision-frameworkvision-language-pretrainingvision-language-transformervisual-question-anwseringmultimodal-datasetsmultimodal-deep-learning
Jupyter Notebook 10.63 k
7 个月前
https://static.github-zh.com/github_avatars/deepseek-ai?size=40
deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

vision-language-modelvision-language-pretrainingfoundation-models
Python 3.88 k
1 年前
https://static.github-zh.com/github_avatars/DAMO-NLP-SG?size=40
DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

large-language-modelsvideo-language-pretrainingvision-language-pretrainingblip2llamaminigpt4cross-modal-pretrainingmulti-modal-chatgpt
Python 3.02 k
1 年前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for ...

聊天机器人clipgpt-4llamallavavicunavision-languagevision-language-pretraining
Python 1.38 k
3 个月前
https://static.github-zh.com/github_avatars/Sense-GVT?size=40
Sense-GVT / DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

big-modelclipmulti-modelself-supervisedvision-language-pretrainingzero-shot
Python 657
3 年前
https://static.github-zh.com/github_avatars/TXH-mercury?size=40
TXH-mercury / VALOR

[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

vision-language-pretraining
Python 293
6 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

聊天机器人clipgpt4gpt4ollama3llavamultimodalvicunavision-languagevision-language-pretraining
Python 276
3 个月前
https://static.github-zh.com/github_avatars/sail-sg?size=40
sail-sg / ptp

[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》

cross-modalityvision-language-pretraining
Python 152
2 年前
https://static.github-zh.com/github_avatars/jusiro?size=40
jusiro / FLAIR

[MedIA'25] FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.

foundation-modelsMedical imagingvision-language-pretraining
Python 125
2 个月前
https://static.github-zh.com/github_avatars/Surrey-UP-Lab?size=40
Surrey-UP-Lab / RegionSpot

Recognize Any Regions

auto-labelinginstance-segmentationobject-detectionopen-worldvision-language-modelvision-language-pretrainingzero-shot
Python 122
6 个月前
https://static.github-zh.com/github_avatars/vgthengane?size=40
vgthengane / Continual-CLIP

Official repository for "CLIP model is an Efficient Continual Learner".

clipcontinual-learningvision-language-pretrainingfoundational-modelsbaseline
Python 96
3 年前
https://static.github-zh.com/github_avatars/ArrowLuo?size=40
ArrowLuo / SegCLIP

PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"

semantic-segmentationtransfer-learningvision-language-pretrainingcontrastive-learning
Python 92
2 年前
https://static.github-zh.com/github_avatars/HieuPhan33?size=40
HieuPhan33 / CVPR2024_MAVL

Multi-Aspect Vision Language Pretraining - CVPR2024

vision-language-modelvision-language-pretrainingzero-shot-classificationzero-shot-segmentation
Python 78
10 个月前
https://static.github-zh.com/github_avatars/marslanm?size=40
marslanm / Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl....

cross-modalmultimodal-datasetsmultimodal-deep-learningmultimodal-pre-trained-modeltransformer-modelsvision-language-pretraining
75
2 年前
https://static.github-zh.com/github_avatars/Zoky-2020?size=40
Zoky-2020 / SGA

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]

vision-language-pretraining
Python 60
2 年前
https://static.github-zh.com/github_avatars/megvii-research?size=40
megvii-research / protoclip

📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)

contrastive-learningself-supervised-learningvision-language-pretraining
Python 52
2 年前
https://static.github-zh.com/github_avatars/TXH-mercury?size=40
TXH-mercury / COSA

[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

video-captioningvision-language-pretraining
Python 43
6 个月前
https://static.github-zh.com/github_avatars/jaisidhsingh?size=40
jaisidhsingh / LoRA-CLIP

Easy wrapper for inserting LoRA layers in CLIP.

image-text-matchingloramultimodalmultimodal-deep-learningparameter-efficient-tuningvision-language-pretraining
Python 33
1 年前
https://static.github-zh.com/github_avatars/alinlab?size=40
alinlab / b2t

Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation

explainable-aivision-language-pretraining
Python 32
2 年前
loading...