GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

large-vision-language-models

Website
Wikipedia
https://static.github-zh.com/github_avatars/BradyFU?size=40
BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

instruction-tuninginstruction-followinglarge-vision-language-modelvisual-instruction-tuningmulti-modalityin-context-learninglarge-language-modelslarge-vision-language-modelsmultimodal-chain-of-thoughtmultimodal-in-context-learningmultimodal-large-language-modelschain-of-thought
15.53 k
3 天前
https://static.github-zh.com/github_avatars/ShareGPT4Omni?size=40
ShareGPT4Omni / ShareGPT4Video

#大语言模型#[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

ChatGPTgptgpt-4vlarge-language-modelslarge-multimodal-modelslarge-vision-language-modelssoratext-to-video
Python 1.06 k
8 个月前
https://static.github-zh.com/github_avatars/NVlabs?size=40
NVlabs / DoRA

#计算机科学#[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

commonsense-reasoning深度学习深度神经网络instruction-tuninglarge-language-modelslarge-vision-language-modelsloraparameter-efficient-fine-tuningparameter-efficient-tuningvision-and-language
Python 797
8 个月前
https://static.github-zh.com/github_avatars/MME-Benchmarks?size=40
MME-Benchmarks / Video-MME

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

large-language-modelslarge-vision-language-modelsmmemultimodal-large-language-modelsVideo
563
1 个月前
https://static.github-zh.com/github_avatars/YingqingHe?size=40
YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

aigclarge-language-modelslarge-vision-language-modelsmultimodal-generationmultimodal-large-language-modelsmultimodal-modelsmultimodalitytext-to-3dtext-to-audiotext-to-imagetext-to-speechtext-to-video大语言模型mllm
HTML 478
2 个月前
https://static.github-zh.com/github_avatars/Paranioar?size=40
Paranioar / Awesome_Matching_Pretraining_Transfering

#Awesome#The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...

cross-modal-retrieval教程Awesome Listsimage-text-matchingimage-text-retrievallarge-language-modelslarge-vision-language-modelsmultimodal-pretrainingparameter-efficient-fine-tuningvision-and-languagemultimodal-large-language-models大语言模型text-to-image-generationtext-to-image-synthesistext-to-video-generation
423
6 个月前
https://static.github-zh.com/github_avatars/burglarhobbit?size=40
burglarhobbit / Awesome-Medical-Large-Language-Models

Curated papers on Large Language Models in Healthcare and Medical domain

large-language-modelsmultimodal-large-language-modelslarge-vision-language-models
319
17 天前
https://static.github-zh.com/github_avatars/tianyi-lab?size=40
tianyi-lab / HallusionBench

#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmarkvlmsgpt-4gpt-4vllavabenchmarkshallucination大语言模型lmmlarge-language-modelslarge-vision-language-models
Python 284
7 个月前
https://static.github-zh.com/github_avatars/zhaochen0110?size=40
zhaochen0110 / Awesome_Think_With_Images

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

large-vision-language-models
230
8 天前
https://static.github-zh.com/github_avatars/ShareGPT4Omni?size=40
ShareGPT4Omni / ShareGPT4V

#大语言模型#[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

ChatGPTgptgpt-4vgpt4vinstruction-tuninglanguage-modellarge-language-modelslarge-multimodal-modelslarge-vision-language-modelsvision-language-modeleccv2024
Python 221
1 年前
https://static.github-zh.com/github_avatars/khuangaf?size=40
khuangaf / Awesome-Chart-Understanding

#Awesome#A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models...

Awesome Listschart-understandinglarge-vision-language-models
208
2 个月前
https://static.github-zh.com/github_avatars/MMStar-Benchmark?size=40
MMStar-Benchmark / MMStar

#大语言模型#[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluationlarge-language-modelslarge-multimodal-modelslarge-vision-language-modellarge-vision-language-models大语言模型multimodalmultimodal-learningmultimodalityvisual-question-answering
Python 181
9 个月前
https://static.github-zh.com/github_avatars/NishilBalar?size=40
NishilBalar / Awesome-LVLM-Hallucination

#大语言模型#up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

hallucinationlarge-vision-language-modelsmultimodal-large-language-modelslarge-language-models大语言模型mllm
137
1 个月前
https://static.github-zh.com/github_avatars/llmbev?size=40
llmbev / talk2bev

Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)

autonomous-drivinggpt-4large-language-modelslarge-vision-language-modelsoccupancy-grid-map
Python 110
7 个月前
https://static.github-zh.com/github_avatars/yu-rp?size=40
yu-rp / apiprompting

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

large-multimodal-modelslarge-vision-language-modellarge-vision-language-modelspromptingvision-language-modelvisual-prompting
Python 90
8 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / GeoPixel

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabi...

foundation-modelslarge-multimodal-modelslarge-vision-language-modelsremote-sensingsegmentation-models
Python 88
18 天前
https://static.github-zh.com/github_avatars/yfzhang114?size=40
yfzhang114 / LLaVA-Align

This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.

hallucinationlarge-vision-language-models
Python 78
4 个月前
https://static.github-zh.com/github_avatars/ys-zong?size=40
ys-zong / VLGuard

[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.

alignmentlarge-language-modelslarge-vision-language-modelssafetyvision-language-model
Python 72
5 个月前
https://static.github-zh.com/github_avatars/FudanDISC?size=40
FudanDISC / ReForm-Eval

#大语言模型#An benchmark for evaluating the capabilities of large vision-language models (LVLMs)

gpt4instruction-tuninglarge-language-models大语言模型multimodalpre-traininglarge-vision-language-modelsbenchmarkembodied-aiin-context-learninginstruction-followingmultimodal-chain-of-thought
Python 46
2 年前
https://static.github-zh.com/github_avatars/Ruiyang-061X?size=40
Ruiyang-061X / Awesome-MLLM-Uncertainty

✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).

large-language-modelslarge-vision-language-modelsmllmmulti-modaluncertaintyuncertainty-estimationuncertainty-quantification
45
2 个月前
loading...