GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

large-multimodal-models

Website
Wikipedia
https://static.github-zh.com/github_avatars/VITA-MLLM?size=40
VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

large-multimodal-modelsmultimodal-large-language-models
Python 2.32 k
3 个月前
OpenAdaptAI/OpenAdapt
https://static.github-zh.com/github_avatars/OpenAdaptAI?size=40
OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

Pythontransformerslarge-language-modelslarge-multimodal-modelshuggingfacesegment-anythinglarge-action-modelagentsai-agentsai-agents-frameworkanthropicgoogle-geminiopenaiultralyticscomputer-usegpt4oomniparser
Python 1.3 k
3 个月前
https://static.github-zh.com/github_avatars/NVlabs?size=40
NVlabs / describe-anything

Implementation for Describe Anything: Detailed Localized Image and Video Captioning

large-multimodal-modelsvision-language-model
Python 1.16 k
1 个月前
https://static.github-zh.com/github_avatars/ShareGPT4Omni?size=40
ShareGPT4Omni / ShareGPT4Video

#大语言模型#[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

ChatGPTgptgpt-4vlarge-language-modelslarge-multimodal-modelslarge-vision-language-modelssoratext-to-video
Python 1.06 k
8 个月前
https://static.github-zh.com/github_avatars/TinyLLaVA?size=40
TinyLLaVA / TinyLLaVA_Factory

#自然语言处理#A Framework of Small-scale Large Multimodal Models

large-multimodal-modelsllamallava自然语言处理transformersvision-language
Python 835
2 个月前
https://static.github-zh.com/github_avatars/richard-peng-xia?size=40
richard-peng-xia / awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

Medical imagingmultimodal-deep-learningmultimodal-learningvisual-question-answeringlarge-language-modelslarge-multimodal-modelsmultimodal-large-language-models
760
11 天前
https://static.github-zh.com/github_avatars/LLaVA-VL?size=40
LLaVA-VL / LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

agentlarge-language-modelslarge-multimodal-modelsmultimodal-large-language-modelstool-use
Python 744
1 年前
https://static.github-zh.com/github_avatars/ictnlp?size=40
ictnlp / LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

efficientgpt4ogpt4vlarge-language-modelslarge-multimodal-modelsllavamultimodalVideovisionvision-language-modelvisual-instruction-tuningllamamultimodal-large-language-models
Python 488
5 个月前
https://static.github-zh.com/github_avatars/MMMU-Benchmark?size=40
MMMU-Benchmark / MMMU

#自然语言处理#This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

机器视觉深度学习深度神经网络evaluationfoundation-modelslarge-language-modelslarge-multimodal-models大语言模型机器学习multimodalmultimodal-deep-learningmultimodal-learningmultimodality自然语言处理question-answeringSTEMvisual-question-answering
Python 440
1 个月前
https://static.github-zh.com/github_avatars/xiaoachen98?size=40
xiaoachen98 / Open-LLaVA-NeXT

#大语言模型#An open-source implementation for training LLaVA-NeXT.

聊天机器人ChatGPTgpt-4gpt4olarge-multimodal-modelsllamallama3llavamulti-modalitymultimodalvision-language-modelvisual-language-learning
Python 398
8 个月前
https://static.github-zh.com/github_avatars/shikiw?size=40
shikiw / OPERA

#大语言模型#[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

large-multimodal-modelsllamamultimodalvision-language-model聊天机器人ChatGPTgpt-4
Python 342
10 个月前
https://static.github-zh.com/github_avatars/thunlp?size=40
thunlp / LEGENT

Open Platform for Embodied Agents

embodied-ailanguage-groundinglarge-multimodal-modelsphysics-engine
Python 319
5 个月前
https://static.github-zh.com/github_avatars/zjysteven?size=40
zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

finetuningfoundation-modelsinstruction-tuning大语言模型large-multimodal-modelsmultimodalmultimodal-large-language-modelsvision-languagevisual-instruction-tuningllava
Python 300
4 个月前
https://static.github-zh.com/github_avatars/JinjieNi?size=40
JinjieNi / MixEval

The official evaluation suite and dynamic data release for MixEval.

benchmarkevaluationevaluation-frameworkfoundation-models大语言模型large-language-modelslarge-multimodal-modelsllm-evaluationllm-evaluation-frameworkllm-inference
Python 242
7 个月前
https://static.github-zh.com/github_avatars/ShareGPT4Omni?size=40
ShareGPT4Omni / ShareGPT4V

#大语言模型#[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

ChatGPTgptgpt-4vgpt4vinstruction-tuninglanguage-modellarge-language-modelslarge-multimodal-modelslarge-vision-language-modelsvision-language-modeleccv2024
Python 221
1 年前
https://static.github-zh.com/github_avatars/friedrichor?size=40
friedrichor / Awesome-Multimodal-Papers

#计算机科学#A curated list of awesome Multimodal studies.

深度学习large-multimodal-modelsmultimodalmultimodal-deep-learningmultimodal-large-language-modelsmultimodal-learning
HTML 194
1 个月前
https://static.github-zh.com/github_avatars/sshh12?size=40
sshh12 / multi_token

#大语言模型#Embed arbitrary modalities (images, audio, documents, etc) into large language models.

large-language-modelsllavalarge-multimodal-modelsmulti-modalitymultimodalvision-language-model大语言模型
Python 183
1 年前
https://static.github-zh.com/github_avatars/MMStar-Benchmark?size=40
MMStar-Benchmark / MMStar

#大语言模型#[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluationlarge-language-modelslarge-multimodal-modelslarge-vision-language-modellarge-vision-language-models大语言模型multimodalmultimodal-learningmultimodalityvisual-question-answering
Python 181
9 个月前
https://static.github-zh.com/github_avatars/WisconsinAIVision?size=40
WisconsinAIVision / YoChameleon

🦎 Yo'Chameleon: Your Personalized Chameleon (CVPR 2025)

chameleonlarge-language-modelslarge-multimodal-models大语言模型lmmspersonalizationpersonalized-generationcvpr
Python 133
1 个月前
https://static.github-zh.com/github_avatars/ritzz-ai?size=40
ritzz-ai / GUI-R1

Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

deep-reinforcement-learninggui-agentlarge-multimodal-modelsmultimodalmultimodal-large-language-modelsgrpoo1
Python 110
1 个月前
loading...