grpo · GitHub Topics

#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, P...

大语言模型 lora llama sft deploy multimodal peft internvl liger qwen2-vl rft deepseek-r1 embedding grpo open-r1 megatron omni llama4 qwen3 qwen3-moe

Python 9.84 k

2 天前

OpenPipe / ART

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!

大语言模型 lora reinforcement-learning agent agentic-ai grpo rl qwen qwen3

Python 7.19 k

5 天前

om-ai-lab / VLM-R1

#大语言模型#Solve Visual Understanding with Reinforced VLMs

deepseek-r1 grpo 大语言模型 multimodal vlm qwen reinforcement-learning

Python 5.53 k

16 天前

SkyworkAI / Skywork-R1V

#大语言模型#Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.

deepseek-r1 大语言模型 reasoning vlm grpo reinforcement-learning

Python 2.94 k

1 个月前

JudgmentLabs / judgeval

#大语言模型#The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.

langchain langgraph llama-index 大语言模型 llm-evaluation llm-observability Open Source openai prompt-engineering agent agentic-ai agents grpo reinforcement-learning rl

Python 990

2 天前

langfengQ / verl-agent

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

llm-agents llm-training reinforcement-learning large-language-models deepseek-r1 grpo agent-framework

Python 879

7 天前

turningpoint-ai / VisualThinker-R1-Zero

Explore the Multimodal “Aha Moment” on 2B Model

multimodal post-training reasoning grpo reinforcement-learning deepseek deepseek-r1 deepseek-r1-zero

Python 608

6 个月前

Tencent-Hunyuan / MixGRPO

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

diffusion grpo reinforcement-learning

Python 597

1 个月前

sail-sg / oat

#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

alignment dpo 大语言模型 rlhf distributed-training reasoning grpo ppo

Python 460

2 天前

modelscope / awesome-deep-reasoning

Collect every awesome work about r1!

collection deepseek grpo o1 qwen rl reasoning

Python 416

4 个月前

zhaochen0110 / OpenThinkIMG

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

grpo reinforcement-learning

Jupyter Notebook 303

3 个月前

jiangxinke / Agentic-RAG-R1

Agentic RAG R1 Framework via Reinforcement Learning

agentic grpo rag rl

Python 291

2 天前

bowang-lab / BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

computational-biology dna reasoning Bioinformatics grpo large-language-models foundation-models

Jupyter Notebook 280

3 个月前

hustvl / AlphaDrive

Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

autonomous-driving grpo planning reasoning reinforcement-learning vision-language-model

Python 277

6 个月前

jianzhnie / Open-R1

#大语言模型#The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

大语言模型 rlhf deepseek-r1 grpo deepseek-v3

Python 270

6 个月前

anakin87 / qwen-scheduler-grpo

#大语言模型#Train a Language Model with GRPO to create a schedule from a list of events and priorities

fine-tuning grpo 大语言模型 post-training reasoning reinforcement-learning

Jupyter Notebook 231

5 个月前

ZJU-REAL / GUI-G2

A Gaussian dense reward framework for GUI grounding training

grpo

Python 225

21 天前

VainF / Thinkless

[Preprint 2025] Thinkless: LLM Learns When to Think

grpo 大语言模型 reinforcement-learning

Python 223

3 个月前

ucla-mobility / AutoVLA

Official implementation of paper "AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning"

autonomous-driving grpo

209

3 个月前

SunzeY / SEAgent

Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"

agent computer-use-agent gui-agent grpo rl vllm

Python 187

1 个月前