GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

lmm

Website
Wikipedia
https://static.github-zh.com/github_avatars/BAAI-Agents?size=40
BAAI-Agents / Cradle

#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

ai-agentai-agents-frameworkcomputer-controlcradlegccgenerative-aigroundinglarge-language-models大语言模型lmmmultimodalityvision-language-modelvlm人工智能
Python 2.22 k
9 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

foundation-modelslmmvision-and-languagevision-language-modelllm-agent
Python 901
2 个月前
https://static.github-zh.com/github_avatars/NVlabs?size=40
NVlabs / EAGLE

#大语言模型#Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Demogpt4huggingfacellamallama3llavalmmmllm大语言模型large-language-models
Python 841
3 个月前
https://static.github-zh.com/github_avatars/LLaVA-VL?size=40
LLaVA-VL / LLaVA-Interactive-Demo

LLaVA-Interactive-Demo

lmmmultimodal
Python 375
1 年前
https://static.github-zh.com/github_avatars/tianyi-lab?size=40
tianyi-lab / HallusionBench

#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmarkvlmsgpt-4gpt-4vllavabenchmarkshallucination大语言模型lmmlarge-language-modelslarge-vision-language-models
Python 293
9 个月前
https://static.github-zh.com/github_avatars/CircleRadon?size=40
CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025

connectorlmmmllm
Python 260
2 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / Video-LLaVA

#大语言模型#PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

大语言模型lmmVideogroundingtranscription
Python 257
2 年前
https://static.github-zh.com/github_avatars/Javis603?size=40
Javis603 / Discord-AIBot

#大语言模型#🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手,整合多种顶级 AI 模型,支持...

人工智能聊天机器人ChatGPTclaudedeepseekDiscorddiscord-botDiscord.JSgemini大语言模型lmmNode.jsopenaixai
JavaScript 242
5 个月前
https://static.github-zh.com/github_avatars/TIGER-AI-Lab?size=40
TIGER-AI-Lab / Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]

languagevisionlmmmllmVideovlmmultimodal
Python 221
4 个月前
https://static.github-zh.com/github_avatars/TideDra?size=40
TideDra / VL-RLHF

#大语言模型#A RLHF Infrastructure for Vision-Language Models

dpo大语言模型lmmmllmrlhfvlm
Python 179
9 个月前
https://static.github-zh.com/github_avatars/xieyuquanxx?size=40
xieyuquanxx / awesome-Large-MultiModal-Hallucination

😎 curated list of awesome LMM hallucinations papers, methods & resources.

hallucinationmulti-modallmmmultimodal
149
1 年前
https://static.github-zh.com/github_avatars/Q-Future?size=40
Q-Future / A-Bench

[ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?

evaluationlmm
143
6 个月前
https://static.github-zh.com/github_avatars/Chenyu-Wang567?size=40
Chenyu-Wang567 / MLLM-Tool

#大语言模型#MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

gpt4大语言模型lmm
Python 127
1 年前
https://static.github-zh.com/github_avatars/graphic-design-ai?size=40
graphic-design-ai / graphist

#大语言模型#Official Repo of Graphist

graphic-design大语言模型lmmmllm
122
1 年前
https://static.github-zh.com/github_avatars/WisconsinAIVision?size=40
WisconsinAIVision / YoLLaVA

#大语言模型#🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant

llava大语言模型lmmlmmspersonalizationneurips
Python 111
4 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / VideoGLaMM

[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

foundation-modelsllm-agentlmmvision-and-languagevision-language-model
Python 74
4 个月前
https://static.github-zh.com/github_avatars/uni-medical?size=40
uni-medical / GMAI-MMBench

#大语言模型#GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.

benchmark大语言模型lmmmedicalvlm
70
7 个月前
https://static.github-zh.com/github_avatars/yisuanwang?size=40
yisuanwang / Idea23D

[COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

3Daigcagentlmm
Jupyter Notebook 51
6 个月前
https://static.github-zh.com/github_avatars/mapluisch?size=40
mapluisch / LLaVA-CLI-with-multiple-images

LLaVA inference with multiple images at once for cross-image analysis.

图像处理inferencellama2llavaPythonlmmlmmspillowPyTorchvisual-question-answeringvqa
Python 50
1 年前
https://static.github-zh.com/github_avatars/Haochen-Wang409?size=40
Haochen-Wang409 / TreeVGR

#大语言模型#Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"

groundinggrpo大语言模型lmmmllmo3rl
Python 48
18 天前
loading...