#

vision-language-model

https://static.github-zh.com/github_avatars/OpenGVLab?size=40

#大语言模型#[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 9.14 k
10 天前
https://static.github-zh.com/github_avatars/QwenLM?size=40

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6.22 k
1 年前
https://static.github-zh.com/github_avatars/jingyaogong?size=40

#大语言模型#🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

Python 4.66 k
5 个月前
https://static.github-zh.com/github_avatars/PKU-Alignment?size=40
Jupyter Notebook 4.54 k
21 天前
https://static.github-zh.com/github_avatars/deepseek-ai?size=40

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 3.96 k
1 年前
https://static.github-zh.com/github_avatars/dvlab-research?size=40

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3.31 k
1 年前
https://static.github-zh.com/github_avatars/MiniMax-AI?size=40

#大语言模型#The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

Python 3.14 k
2 个月前
https://static.github-zh.com/github_avatars/BAAI-Agents?size=40

#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

Python 2.27 k
10 个月前
https://static.github-zh.com/github_avatars/illuin-tech?size=40

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

Python 2.21 k
13 天前
https://static.github-zh.com/github_avatars/Blaizzy?size=40
Python 1.63 k
11 天前
https://static.github-zh.com/github_avatars/showlab?size=40

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Python 1.47 k
4 个月前
https://static.github-zh.com/github_avatars/ByteDance-Seed?size=40

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1.43 k
3 个月前
https://static.github-zh.com/github_avatars/AIDC-AI?size=40

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python 1.34 k
7 天前
https://static.github-zh.com/github_avatars/NVlabs?size=40

[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning

Python 1.33 k
3 个月前
https://static.github-zh.com/github_avatars/NVlabs?size=40
Python 1.31 k
2 年前
loading...
Website
Wikipedia