vision-language-model

#大语言模型#LLaVA是一个具有 GPT-4V 级别功能的大语言和视觉模型助手

gpt-4 聊天机器人 ChatGPT llama multimodal llava foundation-models instruction-tuning multi-modality visual-language-learning llama-2 llama2 vision-language-model

Python 23.54 k

1 年前

OpenGVLab / InternVL

#大语言模型#[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classification image-text-retrieval 大语言模型 semantic-segmentation video-classification vision-language-model vit-22b vit-6b multi-modal gpt gpt-4v gpt-4o

Python 9.14 k

10 天前

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

large-language-models vision-language-model

Python 6.22 k

1 年前

jingyaogong / minimind-v

#大语言模型#🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM！🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

人工智能 ChatGPT vision-language-model

Python 4.66 k

5 个月前

PKU-Alignment / align-anything

Align Anything: Training All-modality Model with Feedback

large-language-models multimodal rlhf chameleon dpo vision-language-model

Jupyter Notebook 4.54 k

21 天前

deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

vision-language-model vision-language-pretraining foundation-models

Python 3.96 k

1 年前

dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

generation large-language-models vision-language-model

Python 3.31 k

1 年前

MiniMax-AI / MiniMax-01

#大语言模型#The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

large-language-models 大语言模型 vision-language-model vlm

Python 3.14 k

2 个月前

jingyi0000 / VLM_survey

#计算机科学#Collection of AWESOME vision-language models for vision tasks

机器视觉深度学习 knowledge-distillation survey transfer-learning vision-language-model clip

2.92 k

4 个月前

InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPT visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model 大语言模型 large-vision-language-model vision-transformer gpt

Python 2.89 k

4 个月前

BAAI-Agents / Cradle

#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

ai-agent ai-agents-framework computer-control cradle gcc generative-ai grounding large-language-models 大语言模型 lmm multimodality vision-language-model vlm 人工智能

Python 2.27 k

10 个月前

illuin-tech / colpali

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

information-retrieval retrieval-augmented-generation vision-language-model

Python 2.21 k

13 天前

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1.77 k

5 个月前

Blaizzy / mlx-vlm

#大语言模型#MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

llava 大语言模型 MLX vision-transformer apple-silicon idefics local-ai paligemma vision-framework vision-language-model florence2 molmo pixtral

Python 1.63 k

11 天前

showlab / ShowUI

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

computer-use vision-language-model agent gui-agent

Python 1.47 k

4 个月前

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

cookbook 大语言模型 multimodal-large-language-models vision-language-model

Jupyter Notebook 1.43 k

3 个月前