large-vision-language-model

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

instruction-tuning instruction-following large-vision-language-model visual-instruction-tuning multi-modality in-context-learning large-language-models large-vision-language-models multimodal-chain-of-thought multimodal-in-context-learning multimodal-large-language-models chain-of-thought

15.94 k

20 天前

PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

instruction-tuning large-vision-language-model multi-modal

Python 3.32 k

8 个月前

InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPT visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model 大语言模型 large-vision-language-model vision-transformer gpt

Python 2.88 k

2 个月前

PKU-YuanGroup / MoE-LLaVA

【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models

large-vision-language-model mixture-of-experts moe multi-modal

Python 2.2 k

16 天前

yaotingwangofficial / Awesome-MCoT

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

chain-of-thought cot deepseek-r1 instruction-tuning large-vision-language-model multimodal multimodal-chain-of-thought multimodal-large-language-models openai-o1 reasoning survey mcts

731

16 天前

jqtangust / hawk

🔥 🔥 🔥 [NeurIPS 2024] Official Implementation of Hawk: Learning to Understand Open-World Video Anomalies

anomaly-detection large-vision-language-model video-understanding anomaly Video video-anomaly-detection

Python 215

4 个月前

MMStar-Benchmark / MMStar

#大语言模型#[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluation large-language-models large-multimodal-models large-vision-language-model large-vision-language-models 大语言模型 multimodal multimodal-learning multimodality visual-question-answering

Python 189

10 个月前

yu-rp / apiprompting

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

large-multimodal-models large-vision-language-model large-vision-language-models prompting vision-language-model visual-prompting

Python 96

10 个月前

Orlando-CS / Awesome-VLA

✨✨latest advancements in VLA models(VIsion Language Action)

large-language-models large-vision-language-model multi-modality

4 个月前

richard-peng-xia / CARES

[NeurIPS'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

large-vision-language-model vision-language-model

Python 74

8 个月前

Ruiyang-061X / VL-Uncertainty

🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".

large-vision-language-model uncertainty-estimation hallucination multi-modal uncertainty uncertainty-quantification vision-language vision-language-model

Python 39

4 个月前

SuperBruceJia / Awesome-Large-Vision-Language-Model

#自然语言处理#Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model

foundation-models large-language-models large-vision-language-model large-vision-language-models multimodal-large-language-models vision-and-language artificial-general-intelligence 人工智能机器视觉深度学习机器学习自然语言处理

8 天前

ADL-X / LLAVIDAL

This is the offical repository of LLAVIDAL

action-recognition large-vision-language-model LLVM

Python 16

4 个月前

ai4ce / LUWA

[CVPR 2024 Highlight] The first benchmark for lithic use-wear analysis leveraging SOTA vision and vision-language models (DINOv2, GPT-4V), demonstrating AI performance surpassing that of expert archae...

ai4science 机器视觉 large-vision-language-model

Jupyter Notebook 4

4 个月前

lca0503 / MergeToVLRM

Source code of our paper "Transferring Textual Preferences to Vision-Language Understanding through Model Merging", ACL 2025

large-vision-language-model model-merging

Python 4

3 个月前

lucaswychan / quant-lvlm

Easy-to-use large vision language model pipeline for quantitative analysis

large-vision-language-model multimodal-learning PyTorch quantitative-finance

Python 3

3 个月前

amazon-science / THRONE

Code release for THRONE, a CVPR 2024 paper on measuring object hallucinations in LVLM generated text.

benchmark cvpr2024 hallucination hallucinations 大语言模型 large-language-models large-vision-language-model large-vision-language-models vision-language-model

Python 1

23 天前

pzrain / DiViCo

Official implementation of TCSVT 2025 paper: DiViCo: Disentangled Visual Token Compression For Efficient Large Vision-Language Model

large-vision-language-model multimodal

Python 0

3 个月前