GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

vision-language-model

Website
Wikipedia
https://static.github-zh.com/github_avatars/haotian-liu?size=40
haotian-liu / LLaVA

#大语言模型#LLaVA是一个具有 GPT-4V 级别功能的大语言和视觉模型助手

gpt-4聊天机器人ChatGPTllamamultimodalllavafoundation-modelsinstruction-tuningmulti-modalityvisual-language-learningllama-2llama2vision-language-model
Python 22.78 k
10 个月前
https://static.github-zh.com/github_avatars/OpenGVLab?size=40
OpenGVLab / InternVL

#大语言模型#[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classificationimage-text-retrieval大语言模型semantic-segmentationvideo-classificationvision-language-modelvit-22bvit-6bmulti-modalgptgpt-4vgpt-4o
Python 8.33 k
17 天前
https://static.github-zh.com/github_avatars/QwenLM?size=40
QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

large-language-modelsvision-language-model
Python 5.98 k
10 个月前
https://static.github-zh.com/github_avatars/PKU-Alignment?size=40
PKU-Alignment / align-anything

Align Anything: Training All-modality Model with Feedback

large-language-modelsmultimodalrlhfchameleondpovision-language-model
Jupyter Notebook 3.93 k
18 天前
https://static.github-zh.com/github_avatars/deepseek-ai?size=40
deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

vision-language-modelvision-language-pretrainingfoundation-models
Python 3.88 k
1 年前
https://static.github-zh.com/github_avatars/jingyaogong?size=40
jingyaogong / minimind-v

#大语言模型#🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

人工智能ChatGPTvision-language-model
Python 3.78 k
2 个月前
https://static.github-zh.com/github_avatars/dvlab-research?size=40
dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

generationlarge-language-modelsvision-language-model
Python 3.28 k
1 年前
https://static.github-zh.com/github_avatars/InternLM?size=40
InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPTvisual-language-learningmulti-modalityfoundationgpt-4instruction-tuningmllmmultimodalvision-language-modellanguage-model大语言模型large-vision-language-modelvision-transformergpt
Python 2.84 k
20 天前
https://static.github-zh.com/github_avatars/MiniMax-AI?size=40
MiniMax-AI / MiniMax-01

#大语言模型#The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

large-language-models大语言模型vision-language-modelvlm
Python 2.79 k
5 天前
https://static.github-zh.com/github_avatars/jingyi0000?size=40
jingyi0000 / VLM_survey

#计算机科学#Collection of AWESOME vision-language models for vision tasks

机器视觉深度学习knowledge-distillationsurveytransfer-learningvision-language-modelclip
2.77 k
21 天前
https://static.github-zh.com/github_avatars/BAAI-Agents?size=40
BAAI-Agents / Cradle

#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

ai-agentai-agents-frameworkcomputer-controlcradlegccgenerative-aigroundinglarge-language-models大语言模型lmmmultimodalityvision-language-modelvlm人工智能
Python 2.11 k
7 个月前
https://static.github-zh.com/github_avatars/illuin-tech?size=40
illuin-tech / colpali

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

information-retrievalretrieval-augmented-generationvision-language-model
Python 1.94 k
5 天前
https://static.github-zh.com/github_avatars/AlibabaResearch?size=40
AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

人工智能documentaimultimodalmultimodal-deep-learningOCR机器视觉vision-language-transformerend-to-end-ocrscene-text-detectionscene-text-detection-recognitionscene-text-recognitiontext-detectiontext-recognitionvision-languagedocumentdocument-analysisdocument-recognitiondocument-understandingdocument-intelligencevision-language-model
C++ 1.73 k
2 个月前
https://static.github-zh.com/github_avatars/Blaizzy?size=40
Blaizzy / mlx-vlm

#大语言模型#MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

llava大语言模型MLXvision-transformerapple-siliconideficslocal-aipaligemmavision-frameworkvision-language-modelflorence2molmopixtral
Python 1.33 k
7 天前
https://static.github-zh.com/github_avatars/NVlabs?size=40
NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioninglanguage-modelmulti-modal-learningmulti-task-learningvision-language-modelvision-and-languagevqa
Python 1.31 k
1 年前
https://static.github-zh.com/github_avatars/showlab?size=40
showlab / ShowUI

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

computer-usevision-language-modelagentgui-agent
Python 1.3 k
17 天前
https://static.github-zh.com/github_avatars/emcf?size=40
emcf / thepipe

#网络爬虫#Get clean data from tricky documents, powered by vision-language models ⚡

multimodalpdfvision-transformerlarge-language-modelsWebdocumentopenaiPythonscrapingvision-language-modelstructured-dataunstructured-data
Python 1.27 k
13 天前
https://static.github-zh.com/github_avatars/ByteDance-Seed?size=40
ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

cookbook大语言模型multimodal-large-language-modelsvision-language-model
Jupyter Notebook 1.21 k
25 天前
https://static.github-zh.com/github_avatars/llm-jp?size=40
llm-jp / awesome-japanese-llm

#大语言模型#日本語LLMまとめ - Overview of Japanese LLMs

language-modellanguage-models大语言模型large-language-modelsjapanesejapanese-languagevision-and-languagefoundation-modelsmultimodalvision-languagevision-language-modelgenerative-aigenerative-modelgenerative-models
TypeScript 1.18 k
14 天前
https://static.github-zh.com/github_avatars/NVlabs?size=40
NVlabs / describe-anything

Implementation for Describe Anything: Detailed Localized Image and Video Captioning

large-multimodal-modelsvision-language-model
Python 1.16 k
1 个月前
loading...