GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

mllm

Website
Wikipedia
https://static.github-zh.com/github_avatars/microsoft?size=40
microsoft / unilm

#自然语言处理#Unilm是一个跨任务、语言和模式的大规模自监督预训练模型

自然语言处理pre-trained-modelunilmminilmlayoutlmlayoutxlmbeitdocument-aitrocrbeit-3foundation-modelsxlm-edeepnet大语言模型multimodalmllmkosmoskosmos-1textdiffuserbitnet
Python 21.58 k
1 个月前
https://static.github-zh.com/github_avatars/simular-ai?size=40
simular-ai / Agent-S

Agent S: an open agentic framework that uses computers like a human

agent-computer-interfaceai-agentscomputer-automationgui-agentsmemorymllmplanningretrieval-augmented-generationin-context-reinforcement-learningcomputer-usegrounding
Python 5.91 k
13 天前
https://static.github-zh.com/github_avatars/X-PLUG?size=40
X-PLUG / MobileAgent

#安卓#Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

agentgpt4vmllmmobile-agentsmultimodalmultimodal-large-language-modelsmultimodal-agentAndroidAppGUI移动自动化copilotharmonyiOS
Python 4.49 k
1 个月前
https://static.github-zh.com/github_avatars/manycore-research?size=40
manycore-research / SpatialLM

SpatialLM: Training Large Language Models for Structured Indoor Modeling

mllm
Python 3.55 k
8 天前
https://static.github-zh.com/github_avatars/NExT-GPT?size=40
NExT-GPT / NExT-GPT

#大语言模型#Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

ChatGPTfoundation-modelsgpt-4instruction-tuninglarge-language-models大语言模型multi-modal-chatgptmultimodalvisual-language-learningmllm
Python 3.54 k
3 个月前
https://static.github-zh.com/github_avatars/ant-research?size=40
ant-research / MagicQuill

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

aigcimage-editingmllmgradio
Python 3.51 k
1 天前
https://static.github-zh.com/github_avatars/atfortes?size=40
atfortes / Awesome-LLM-Reasoning

#大语言模型#Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

language-modelsreasoningpromptin-context-learningChatGPTchain-of-thoughtprompt-engineeringcotAwesome Listsgptmllmmultimodalpapersgpt-4oopenai-o1strawberrydeepseekdeepseek-r1
3.26 k
3 个月前
https://static.github-zh.com/github_avatars/InternLM?size=40
InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPTvisual-language-learningmulti-modalityfoundationgpt-4instruction-tuningmllmmultimodalvision-language-modellanguage-model大语言模型large-vision-language-modelvision-transformergpt
Python 2.88 k
2 个月前
https://static.github-zh.com/github_avatars/X-PLUG?size=40
X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

chart-understandingdocument-understandingmllmmultimodalmultimodal-large-language-modelstable-understanding
Python 2.23 k
2 个月前
https://static.github-zh.com/github_avatars/cambrian-mllm?size=40
cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

聊天机器人clip机器视觉dinoinstruction-tuninglarge-language-models大语言模型mllmmultimodal-large-language-modelsrepresentation-learning
Python 1.93 k
9 个月前
https://static.github-zh.com/github_avatars/coderonion?size=40
coderonion / awesome-yolo-object-detection

#数据仓库#🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

yoloyolov5tensorrtobject-detectionyolov8CUDA大语言模型llamavlm数据集deepseekGUImllmqwen
1.55 k
2 个月前
https://static.github-zh.com/github_avatars/magic-research?size=40
magic-research / Sa2VA

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

机器视觉mllmlarge-language-models
Python 1.2 k
1 个月前
https://static.github-zh.com/github_avatars/BAAI-DCAI?size=40
BAAI-DCAI / Bunny

#大语言模型#A family of lightweight multimodal models.

mllmChatGPTgpt-4multimodal-large-language-modelsvlm中文english
Python 1.02 k
8 个月前
https://static.github-zh.com/github_avatars/NVlabs?size=40
NVlabs / EAGLE

#大语言模型#Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Demogpt4huggingfacellamallama3llavalmmmllm大语言模型large-language-models
Python 838
3 个月前
https://static.github-zh.com/github_avatars/CircleRadon?size=40
CircleRadon / Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

mllmsamvisual-instruction-tuningpixel-understanding
Python 826
3 个月前
https://static.github-zh.com/github_avatars/taco-group?size=40
taco-group / OpenEMMA

#算法刷题#OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

算法人工智能autonomous-drivingautonomous-vehiclesautonomygenerative-ai机器学习mllmNetworkperception
Python 755
3 个月前
https://static.github-zh.com/github_avatars/coderonion?size=40
coderonion / awesome-llm-and-aigc

#数据仓库#🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applic...

gpt大语言模型Awesome Listsllamaaigclangchain数据集yolotritonCUDAvlmdeepseekqwenmllmai4sciencereinforcement-learningqwen3
720
7 天前
https://static.github-zh.com/github_avatars/VITA-MLLM?size=40
VITA-MLLM / Woodpecker

#大语言模型#✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucinationhallucinationslarge-language-models大语言模型mllmmultimodal-large-language-modelsmultimodality
Python 639
7 个月前
https://static.github-zh.com/github_avatars/LYL1015?size=40
LYL1015 / JarvisArt

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

agent图像处理large-language-modelsmllm
JavaScript 581
5 天前
https://static.github-zh.com/github_avatars/FoundationVision?size=40
FoundationVision / Groma

#大语言模型#[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

grounding大语言模型mllmlarge-language-modelsfoundation-modelsllamallama2multimodalvision-language-model
Python 575
1 年前
loading...