GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

mllm

Website
Wikipedia
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / unilm

#自然语言处理#Unilm是一个跨任务、语言和模式的大规模自监督预训练模型

自然语言处理pre-trained-modelunilmminilmlayoutlmlayoutxlmbeitdocument-aitrocrbeit-3foundation-modelsxlm-edeepnet大语言模型multimodalmllmkosmoskosmos-1textdiffuserbitnet
Python 21.39 k
12 天前
https://static.github-zh.com/github_avatars/simular-ai?size=40
simular-ai / Agent-S

Agent S: an open agentic framework that uses computers like a human

agent-computer-interfaceai-agentscomputer-automationgui-agentsmemorymllmplanningretrieval-augmented-generationin-context-reinforcement-learningcomputer-usegrounding
Python 5.44 k
5 天前
https://static.github-zh.com/github_avatars/X-PLUG?size=40
X-PLUG / MobileAgent

#安卓#Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

agentgpt4vmllmmobile-agentsmultimodalmultimodal-large-language-modelsmultimodal-agentAndroidAppGUI移动自动化copilotharmonyiOS
Python 4.33 k
9 天前
https://static.github-zh.com/github_avatars/NExT-GPT?size=40
NExT-GPT / NExT-GPT

#大语言模型#Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

ChatGPTfoundation-modelsgpt-4instruction-tuninglarge-language-models大语言模型multi-modal-chatgptmultimodalvisual-language-learningmllm
Python 3.51 k
1 个月前
https://static.github-zh.com/github_avatars/ant-research?size=40
ant-research / MagicQuill

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

aigcimage-editingmllmgradio
Python 3.44 k
2 个月前
https://static.github-zh.com/github_avatars/manycore-research?size=40
manycore-research / SpatialLM

SpatialLM: Training Large Language Models for Structured Indoor Modeling

mllm
Python 3.26 k
5 天前
https://static.github-zh.com/github_avatars/atfortes?size=40
atfortes / Awesome-LLM-Reasoning

#大语言模型#Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

language-modelsreasoningpromptin-context-learningChatGPTchain-of-thoughtprompt-engineeringcotAwesome Listsgptmllmmultimodalpapersgpt-4oopenai-o1strawberrydeepseekdeepseek-r1
3.15 k
1 个月前
https://static.github-zh.com/github_avatars/InternLM?size=40
InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPTvisual-language-learningmulti-modalityfoundationgpt-4instruction-tuningmllmmultimodalvision-language-modellanguage-model大语言模型large-vision-language-modelvision-transformergpt
Python 2.84 k
20 天前
https://static.github-zh.com/github_avatars/X-PLUG?size=40
X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

chart-understandingdocument-understandingmllmmultimodalmultimodal-large-language-modelstable-understanding
Python 2.2 k
16 天前
https://static.github-zh.com/github_avatars/cambrian-mllm?size=40
cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

聊天机器人clip机器视觉dinoinstruction-tuninglarge-language-models大语言模型mllmmultimodal-large-language-modelsrepresentation-learning
Python 1.91 k
8 个月前
https://static.github-zh.com/github_avatars/coderonion?size=40
coderonion / awesome-yolo-object-detection

#数据仓库#🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

yoloyolov5tensorrtobject-detectionyolov8CUDA大语言模型llamavlm数据集deepseekGUImllmqwen
1.5 k
15 天前
https://static.github-zh.com/github_avatars/magic-research?size=40
magic-research / Sa2VA

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

机器视觉mllmlarge-language-models
Python 1.12 k
22 天前
https://static.github-zh.com/github_avatars/BAAI-DCAI?size=40
BAAI-DCAI / Bunny

#大语言模型#A family of lightweight multimodal models.

mllmChatGPTgpt-4multimodal-large-language-modelsvlm中文english
Python 1.02 k
7 个月前
https://static.github-zh.com/github_avatars/CircleRadon?size=40
CircleRadon / Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

mllmsamvisual-instruction-tuningpixel-understanding
Python 821
2 个月前
https://static.github-zh.com/github_avatars/NVlabs?size=40
NVlabs / EAGLE

#大语言模型#Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Demogpt4huggingfacellamallama3llavalmmmllm大语言模型large-language-models
Python 792
2 个月前
https://static.github-zh.com/github_avatars/coderonion?size=40
coderonion / awesome-llm-and-aigc

#数据仓库#🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applic...

gpt大语言模型Awesome Listsllamaaigclangchain数据集yolotritonCUDAvlmdeepseekqwenmllmai4sciencereinforcement-learningqwen3
706
1 个月前
https://static.github-zh.com/github_avatars/taco-group?size=40
taco-group / OpenEMMA

#算法刷题#OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

算法人工智能autonomous-drivingautonomous-vehiclesautonomygenerative-ai机器学习mllmNetworkperception
Python 697
1 个月前
https://static.github-zh.com/github_avatars/VITA-MLLM?size=40
VITA-MLLM / Woodpecker

#大语言模型#✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucinationhallucinationslarge-language-models大语言模型mllmmultimodal-large-language-modelsmultimodality
Python 636
6 个月前
https://static.github-zh.com/github_avatars/FoundationVision?size=40
FoundationVision / Groma

#大语言模型#[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

grounding大语言模型mllmlarge-language-modelsfoundation-modelsllamallama2multimodalvision-language-model
Python 567
1 年前
https://static.github-zh.com/github_avatars/SkyworkAI?size=40
SkyworkAI / Vitron

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

mllmmultimodal-large-language-modelssegmentation
Python 545
8 个月前
loading...