mllm · GitHub Topics

#自然语言处理#Unilm是一个跨任务、语言和模式的大规模自监督预训练模型

自然语言处理 pre-trained-model unilm minilm layoutlm layoutxlm beit document-ai trocr beit-3 foundation-models xlm-e deepnet 大语言模型 multimodal mllm kosmos kosmos-1 textdiffuser bitnet

Python 21.58 k

1 个月前

simular-ai / Agent-S

Agent S: an open agentic framework that uses computers like a human

agent-computer-interface ai-agents computer-automation gui-agents memory mllm planning retrieval-augmented-generation in-context-reinforcement-learning computer-use grounding

Python 5.91 k

13 天前

X-PLUG / MobileAgent

#安卓#Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

agent gpt4v mllm mobile-agents multimodal multimodal-large-language-models multimodal-agent Android App GUI 移动自动化 copilot harmony iOS

Python 4.49 k

1 个月前

manycore-research / SpatialLM

SpatialLM: Training Large Language Models for Structured Indoor Modeling

mllm

Python 3.55 k

8 天前

NExT-GPT / NExT-GPT

#大语言模型#Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

ChatGPT foundation-models gpt-4 instruction-tuning large-language-models 大语言模型 multi-modal-chatgpt multimodal visual-language-learning mllm

Python 3.54 k

3 个月前

ant-research / MagicQuill

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

aigc image-editing mllm gradio

Python 3.51 k

1 天前

atfortes / Awesome-LLM-Reasoning

#大语言模型#Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

language-models reasoning prompt in-context-learning ChatGPT chain-of-thought prompt-engineering cot Awesome Lists gpt mllm multimodal papers gpt-4o openai-o1 strawberry deepseek deepseek-r1

3.26 k

3 个月前

InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPT visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model 大语言模型 large-vision-language-model vision-transformer gpt

Python 2.88 k

2 个月前

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding

Python 2.23 k

2 个月前

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

聊天机器人 clip 机器视觉 dino instruction-tuning large-language-models 大语言模型 mllm multimodal-large-language-models representation-learning

Python 1.93 k

9 个月前

coderonion / awesome-yolo-object-detection

#数据仓库#🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

yolo yolov5 tensorrt object-detection yolov8 CUDA 大语言模型 llama vlm 数据集 deepseek GUI mllm qwen

1.55 k

2 个月前

magic-research / Sa2VA

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

机器视觉 mllm large-language-models

Python 1.2 k

1 个月前

BAAI-DCAI / Bunny

#大语言模型#A family of lightweight multimodal models.

mllm ChatGPT gpt-4 multimodal-large-language-models vlm 中文 english

Python 1.02 k

8 个月前

NVlabs / EAGLE

#大语言模型#Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Demo gpt4 huggingface llama llama3 llava lmm mllm 大语言模型 large-language-models

Python 838

3 个月前

CircleRadon / Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

mllm sam visual-instruction-tuning pixel-understanding

Python 826

3 个月前

taco-group / OpenEMMA

#算法刷题#OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

算法人工智能 autonomous-driving autonomous-vehicles autonomy generative-ai 机器学习 mllm Network perception

Python 755

3 个月前

coderonion / awesome-llm-and-aigc

#数据仓库#🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applic...

gpt 大语言模型 Awesome Lists llama aigc langchain 数据集 yolo triton CUDA vlm deepseek qwen mllm ai4science reinforcement-learning qwen3

720

7 天前

VITA-MLLM / Woodpecker

#大语言模型#✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucination hallucinations large-language-models 大语言模型 mllm multimodal-large-language-models multimodality

Python 639

7 个月前

LYL1015 / JarvisArt

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

agent 图像处理 large-language-models mllm

JavaScript 581

5 天前

FoundationVision / Groma

#大语言模型#[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

grounding 大语言模型 mllm large-language-models foundation-models llama llama2 multimodal vision-language-model

Python 575

1 年前