qwen2-vl · GitHub Topics

#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava,...

大语言模型 lora llama sft deploy multimodal peft internvl liger qwen2-vl rft deepseek-r1 embedding grpo open-r1 megatron omni llama4 qwen3 qwen3-moe

Python 9 k

2 小时前

langmanus / langmanus

#大语言模型#A community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search, c...

agi 自动化 deep-research langchain langgraph 大语言模型 qwen qwen2-vl agent agents 人工智能 multi-agent multi-agent-systems deepseek deepseek-r1

Python 5.13 k

4 个月前

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision transformers vision-and-language vqa qwen2-vl

Python 2.6 k

3 天前

2U1 / Qwen2-VL-Finetune

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

聊天机器人 multimodal qwen2-vl vision-language vision-language-model qwen2-5

Python 1.01 k

6 天前

PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high per...

aigc stable-diffusion clip image-to-text text-to-image controlnet multimodal text-to-video dit llava sora qwen2-vl minicpm-v

Python 675

1 天前

NetEase-Media / grps_trtllm

#大语言模型#Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, d...

大语言模型 openai tensorrt-llm chatglm llama3 qwen2 function-call ai-agent llama-index multi-modal deepseek-r1 phi qwq qwen2-vl minicpm-v internvl qwen3

Python 148

3 个月前

lucasjinreal / Crane

A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.

llama-cpp mllm qwen2-vl Rust qwen3

Rust 143

5 天前

drive-bench / toolkit

#大语言模型#[ICCV 2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

autonomous-driving ChatGPT internvl qwen2-vl

Python 95

3 天前

arcstep / illufly

#大语言模型#✨🦋 illufly - 【幻蝶】基于记忆蒸馏、资料检索的自我进化智能体

agent 人工智能 glm-4 gpt 大语言模型 multiagent openai qwen qwen2 qwen2-vl rag growth

Python 69

2 个月前

soulteary / dify-with-qwen-vl

视频理解：千问视频多模态模型 & Dify

dify qwen2 qwen2-vl

Python 62

1 年前

col14m / cadrille

#大语言模型#cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning

cad 大语言模型 PyTorch qwen2-vl vlm

Python 44

2 个月前

fireicewolf / wd-llm-caption-cli

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

qwen2-vl florence-2

Python 38

4 个月前

see2023 / autoXHS

#网络爬虫#基于多模态大模型的智能搜索助手，通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.

大语言模型 qwen2-vl Selenium xiaohongshu spider

Python 23

9 个月前

shaadclt / Qwen2-VL-OCR-VQA

This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabiliti...

optical-character-recognition qwen2-vl visual-question-answering

Jupyter Notebook 19

9 个月前

Younis-Ahmed / qwen-ai-provider

Community-built Qwen AI Provider for Vercel AI SDK - Integrate Alibaba Cloud's Qwen models with Vercel's AI application framework

人工智能 vercel-ai vercel-ai-sdk qwen qwen2-5 qwen2-vl generative-ai Vercel alibaba-cloud language-model

TypeScript 18

16 小时前

BUAADreamer / Qwen2-VL-History

Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums

history mllm multimodal-large-language-models qwen2-vl

10 个月前

zhangguanghao523 / CMMCoT

Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

chain-of-thought cot mllm qwen2-vl

Python 9

3 个月前

ZachcZhang / Qwen2-VL-inference

An open-source server implementation for inference Qwen2-VL series model using fastapi.

FastAPI huggingface inference mllm qwen2-vl

Python 9

8 个月前

Valdanitooooo / chat_with_qwen2_vl_test

qwen2-vl

Python 8

7 个月前

aws-samples / sample-for-multi-modal-document-to-json-with-sagemaker-ai

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Fra...

Amazon Web Services document-processing fine-tuning huggingface idp llama multimodal qwen2-vl sagemaker sft Swift

Jupyter Notebook 8

15 天前