GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

vision-and-language

Website
Wikipedia
https://static.github-zh.com/github_avatars/aishwaryanr?size=40
aishwaryanr / awesome-generative-ai-guide

#面试#A one stop repository for generative AI research updates, interview resources, notebooks and much more!

Awesome Listsgenerative-ai面试large-language-models大语言模型notebook-jupytervision-and-language
12.86 k
8 天前
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / LAVIS

#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence

深度学习deep-learning-libraryimage-captioningsalesforcevision-and-languagevision-frameworkvision-language-pretrainingvision-language-transformervisual-question-anwseringmultimodal-datasetsmultimodal-deep-learning
Jupyter Notebook 10.63 k
7 个月前
https://static.github-zh.com/github_avatars/roboflow?size=40
roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

captioningfine-tuningflorence-2multimodalobjectdetectionpaligemmaphi-3-visiontransformersvision-and-languagevqaqwen2-vl
Python 2.57 k
6 天前
https://static.github-zh.com/github_avatars/om-ai-lab?size=40
om-ai-lab / OmAgent

#大语言模型#Build multimodal language agents for fast prototype and production

large-language-modelsmultimodal-agentvision-and-languageagentworkflow聊天机器人gpt4大语言模型multimodalragvlmgptgradiollamallavaopenaiPythongemini
Python 2.51 k
3 个月前
https://static.github-zh.com/github_avatars/salesforce?size=40
salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method

vision-and-languagerepresentation-learningcontrastive-learning
Python 1.66 k
3 年前
https://static.github-zh.com/github_avatars/open-mmlab?size=40
open-mmlab / Multimodal-GPT

Multimodal-GPT

flamingogptgpt-4llamamultimodaltransformervision-and-language
Python 1.5 k
2 年前
https://static.github-zh.com/github_avatars/dandelin?size=40
dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

vision-and-language
Python 1.47 k
1 年前
https://static.github-zh.com/github_avatars/om-ai-lab?size=40
om-ai-lab / OmDet

Real-time and accurate open-vocabulary end-to-end object detection

object-detectionvision-and-languagezero-shot-object-detection机器视觉zero-shotcocoreal-time
Python 1.32 k
6 个月前
https://static.github-zh.com/github_avatars/NVlabs?size=40
NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioninglanguage-modelmulti-modal-learningmulti-task-learningvision-language-modelvision-and-languagevqa
Python 1.31 k
1 年前
https://static.github-zh.com/github_avatars/llm-jp?size=40
llm-jp / awesome-japanese-llm

#大语言模型#日本語LLMまとめ - Overview of Japanese LLMs

language-modellanguage-models大语言模型large-language-modelsjapanesejapanese-languagevision-and-languagefoundation-modelsmultimodalvision-languagevision-language-modelgenerative-aigenerative-modelgenerative-models
TypeScript 1.18 k
15 天前
https://static.github-zh.com/github_avatars/yuewang-cuhk?size=40
yuewang-cuhk / awesome-vision-language-pretraining-papers

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

vision-and-languagepretrainingmultimodal-deep-learningbert
1.15 k
3 年前
microsoft/Oscar
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / Oscar

Oscar and VinVL

vision-and-languagepre-trainingimage-captioningvqaoscar
Python 1.05 k
2 年前
https://static.github-zh.com/github_avatars/rhymes-ai?size=40
rhymes-ai / Aria

Codebase for Aria - an Open Multimodal Native MoE

mixture-of-expertsmultimodalvision-and-language
Jupyter Notebook 1.05 k
5 个月前
https://static.github-zh.com/github_avatars/OFA-Sys?size=40
OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

foundation-modelsmultimodalrepresentation-learningvision-languageaudio-languagevision-and-languagevision-transformercontrastive-loss
Python 1.04 k
8 个月前
https://static.github-zh.com/github_avatars/YehLi?size=40
YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

image-captioningvideo-captioningvision-and-languagepretrainingcross-modal-retrievalvisual-question-answeringtden
Python 969
2 年前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

foundation-modelslmmvision-and-languagevision-language-modelllm-agent
Python 888
7 天前
https://static.github-zh.com/github_avatars/26hzhang?size=40
26hzhang / DL-NLP-Readings

#自然语言处理#My Reading Lists of Deep Learning and Natural Language Processing

Bukkit深度学习自然语言处理reinforcement-learningcommonsenselanguage-modelRobotics机器学习vision-and-language
TeX 849
3 年前
https://static.github-zh.com/github_avatars/SunzeY?size=40
SunzeY / AlphaCLIP

#计算机科学#[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

深度学习机器学习vision-languagevision-language-modelvision-transformervision-and-language
Jupyter Notebook 822
13 天前
https://static.github-zh.com/github_avatars/OpenRobotLab?size=40
OpenRobotLab / PointLLM

[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds

3D聊天机器人foundation-modelsgpt-4large-language-modelsllamamultimodalPoint cloudrepresentation-learningvision-and-language
Python 818
25 天前
https://static.github-zh.com/github_avatars/NVlabs?size=40
NVlabs / DoRA

#计算机科学#[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

commonsense-reasoning深度学习深度神经网络instruction-tuninglarge-language-modelslarge-vision-language-modelsloraparameter-efficient-fine-tuningparameter-efficient-tuningvision-and-language
Python 797
8 个月前
loading...