GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

multimodal-large-language-models

Website
Wikipedia
https://static.github-zh.com/github_avatars/BradyFU?size=40
BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

instruction-tuninginstruction-followinglarge-vision-language-modelvisual-instruction-tuningmulti-modalityin-context-learninglarge-language-modelslarge-vision-language-modelsmultimodal-chain-of-thoughtmultimodal-in-context-learningmultimodal-large-language-modelschain-of-thought
15.53 k
3 天前
https://static.github-zh.com/github_avatars/X-PLUG?size=40
X-PLUG / MobileAgent

#安卓#Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

agentgpt4vmllmmobile-agentsmultimodalmultimodal-large-language-modelsmultimodal-agentAndroidAppGUI移动自动化copilotharmonyiOS
Python 4.34 k
10 天前
https://static.github-zh.com/github_avatars/joanrod?size=40
joanrod / star-vector

#大语言模型#StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textu...

大语言模型multimodal-large-language-modelsSVGvlm
Python 3.88 k
2 个月前
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / modelscope-agent

#大语言模型#ModelScope-Agent: An agent framework connecting models in ModelScope with the world

agentgptschatglm-4大语言模型qwenopen-gptsmulti-agentsmobile-agentassistantapi聊天机器人Androidmobile-agentsmultimodal-large-language-modelsragCode数据科学
Python 3.18 k
4 天前
https://static.github-zh.com/github_avatars/ictnlp?size=40
ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

large-language-modelsmultimodal-large-language-modelsspeech-to-text
Python 2.94 k
1 个月前
https://static.github-zh.com/github_avatars/VITA-MLLM?size=40
VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

large-multimodal-modelsmultimodal-large-language-models
Python 2.32 k
3 个月前
https://static.github-zh.com/github_avatars/X-PLUG?size=40
X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

chart-understandingdocument-understandingmllmmultimodalmultimodal-large-language-modelstable-understanding
Python 2.2 k
17 天前
https://static.github-zh.com/github_avatars/cambrian-mllm?size=40
cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

聊天机器人clip机器视觉dinoinstruction-tuninglarge-language-models大语言模型mllmmultimodal-large-language-modelsrepresentation-learning
Python 1.91 k
8 个月前
https://static.github-zh.com/github_avatars/YangLing0818?size=40
YangLing0818 / RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)

large-language-modelsmultimodal-large-language-modelsimage-edittingtext-to-image
Jupyter Notebook 1.81 k
4 个月前
https://static.github-zh.com/github_avatars/ByteDance-Seed?size=40
ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

cookbook大语言模型multimodal-large-language-modelsvision-language-model
Jupyter Notebook 1.21 k
1 个月前
https://static.github-zh.com/github_avatars/BAAI-DCAI?size=40
BAAI-DCAI / Bunny

#大语言模型#A family of lightweight multimodal models.

mllmChatGPTgpt-4multimodal-large-language-modelsvlm中文english
Python 1.02 k
7 个月前
https://static.github-zh.com/github_avatars/Henry-23?size=40
Henry-23 / VideoChat

实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cas...

dialogue-systemsreal-timedigital-humanlip-syncmusetalkstreamingtalking-headasrttsend-to-endmultimodal-large-language-models
Python 967
3 个月前
https://static.github-zh.com/github_avatars/AIDC-AI?size=40
AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

聊天机器人llama3multimodalmultimodal-large-language-modelsmultimodalityqwenvision-language-model
Python 933
3 个月前
https://static.github-zh.com/github_avatars/X-LANCE?size=40
X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

audio-processing大语言模型multimodal-large-language-modelspeftspeech-processing
Python 826
2 个月前
https://static.github-zh.com/github_avatars/richard-peng-xia?size=40
richard-peng-xia / awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

Medical imagingmultimodal-deep-learningmultimodal-learningvisual-question-answeringlarge-language-modelslarge-multimodal-modelsmultimodal-large-language-models
760
11 天前
https://static.github-zh.com/github_avatars/LLaVA-VL?size=40
LLaVA-VL / LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

agentlarge-language-modelslarge-multimodal-modelsmultimodal-large-language-modelstool-use
Python 744
1 年前
https://static.github-zh.com/github_avatars/deepglint?size=40
deepglint / unicom

Large-Scale Visual Representation Model

vision-transformerlarge-language-modelsmultimodal-large-language-models
Python 687
1 个月前
https://static.github-zh.com/github_avatars/yaotingwangofficial?size=40
yaotingwangofficial / Awesome-MCoT

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

chain-of-thoughtcotdeepseek-r1instruction-tuninglarge-vision-language-modelmultimodalmultimodal-chain-of-thoughtmultimodal-large-language-modelsopenai-o1reasoningsurveymcts
642
1 个月前
https://static.github-zh.com/github_avatars/VITA-MLLM?size=40
VITA-MLLM / Woodpecker

#大语言模型#✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucinationhallucinationslarge-language-models大语言模型mllmmultimodal-large-language-modelsmultimodality
Python 636
6 个月前
https://static.github-zh.com/github_avatars/rese1f?size=40
rese1f / MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

机器视觉multimodal-large-language-modelsllamalarge-language-modelsdataset
Python 626
5 个月前
loading...