lmm · GitHub Topics

#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

ai-agent ai-agents-framework computer-control cradle gcc generative-ai grounding large-language-models 大语言模型 lmm multimodality vision-language-model vlm 人工智能

Python 2.22 k

9 个月前

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

foundation-models lmm vision-and-language vision-language-model llm-agent

Python 901

2 个月前

NVlabs / EAGLE

#大语言模型#Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Demo gpt4 huggingface llama llama3 llava lmm mllm 大语言模型 large-language-models

Python 841

3 个月前

LLaVA-VL / LLaVA-Interactive-Demo

LLaVA-Interactive-Demo

lmm multimodal

Python 375

1 年前

tianyi-lab / HallusionBench

#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark vlms gpt-4 gpt-4v llava benchmarks hallucination 大语言模型 lmm large-language-models large-vision-language-models

Python 293

9 个月前

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025

connector lmm mllm

Python 260

2 个月前

mbzuai-oryx / Video-LLaVA

#大语言模型#PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

大语言模型 lmm Video grounding transcription

Python 257

2 年前

Javis603 / Discord-AIBot

#大语言模型#🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手，整合多种顶级 AI 模型，支持...