GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

grpo

Website
Wikipedia
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / ms-swift

#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, P...

大语言模型lorallamasftdeploymultimodalpeftinternvlligerqwen2-vlrftdeepseek-r1embeddinggrpoopen-r1megatronomnillama4qwen3qwen3-moe
Python 9.84 k
2 天前
OpenPipe/ART
https://static.github-zh.com/github_avatars/OpenPipe?size=40
OpenPipe / ART

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!

大语言模型lorareinforcement-learningagentagentic-aigrporlqwenqwen3
Python 7.19 k
5 天前
https://static.github-zh.com/github_avatars/om-ai-lab?size=40
om-ai-lab / VLM-R1

#大语言模型#Solve Visual Understanding with Reinforced VLMs

deepseek-r1grpo大语言模型multimodalvlmqwenreinforcement-learning
Python 5.53 k
16 天前
https://static.github-zh.com/github_avatars/SkyworkAI?size=40
SkyworkAI / Skywork-R1V

#大语言模型#Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.

deepseek-r1大语言模型reasoningvlmgrporeinforcement-learning
Python 2.94 k
1 个月前
https://static.github-zh.com/github_avatars/JudgmentLabs?size=40
JudgmentLabs / judgeval

#大语言模型#The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.

langchainlanggraphllama-index大语言模型llm-evaluationllm-observabilityOpen Sourceopenaiprompt-engineeringagentagentic-aiagentsgrporeinforcement-learningrl
Python 990
2 天前
https://static.github-zh.com/github_avatars/langfengQ?size=40
langfengQ / verl-agent

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

llm-agentsllm-trainingreinforcement-learninglarge-language-modelsdeepseek-r1grpoagent-framework
Python 879
7 天前
https://static.github-zh.com/github_avatars/turningpoint-ai?size=40
turningpoint-ai / VisualThinker-R1-Zero

Explore the Multimodal “Aha Moment” on 2B Model

multimodalpost-trainingreasoninggrporeinforcement-learningdeepseekdeepseek-r1deepseek-r1-zero
Python 608
6 个月前
https://static.github-zh.com/github_avatars/Tencent-Hunyuan?size=40
Tencent-Hunyuan / MixGRPO

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

diffusiongrporeinforcement-learning
Python 597
1 个月前
https://static.github-zh.com/github_avatars/sail-sg?size=40
sail-sg / oat

#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

alignmentdpo大语言模型rlhfdistributed-trainingreasoninggrpoppo
Python 460
2 天前
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / awesome-deep-reasoning

Collect every awesome work about r1!

collectiondeepseekgrpoo1qwenrlreasoning
Python 416
4 个月前
https://static.github-zh.com/github_avatars/zhaochen0110?size=40
zhaochen0110 / OpenThinkIMG

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

grporeinforcement-learning
Jupyter Notebook 303
3 个月前
https://static.github-zh.com/github_avatars/jiangxinke?size=40
jiangxinke / Agentic-RAG-R1

Agentic RAG R1 Framework via Reinforcement Learning

agenticgrporagrl
Python 291
2 天前
https://static.github-zh.com/github_avatars/bowang-lab?size=40
bowang-lab / BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

computational-biologydnareasoningBioinformaticsgrpolarge-language-modelsfoundation-models
Jupyter Notebook 280
3 个月前
https://static.github-zh.com/github_avatars/hustvl?size=40
hustvl / AlphaDrive

Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

autonomous-drivinggrpoplanningreasoningreinforcement-learningvision-language-model
Python 277
6 个月前
https://static.github-zh.com/github_avatars/jianzhnie?size=40
jianzhnie / Open-R1

#大语言模型#The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

大语言模型rlhfdeepseek-r1grpodeepseek-v3
Python 270
6 个月前
https://static.github-zh.com/github_avatars/anakin87?size=40
anakin87 / qwen-scheduler-grpo

#大语言模型#Train a Language Model with GRPO to create a schedule from a list of events and priorities

fine-tuninggrpo大语言模型post-trainingreasoningreinforcement-learning
Jupyter Notebook 231
5 个月前
https://static.github-zh.com/github_avatars/ZJU-REAL?size=40
ZJU-REAL / GUI-G2

A Gaussian dense reward framework for GUI grounding training

grpo
Python 225
21 天前
https://static.github-zh.com/github_avatars/VainF?size=40
VainF / Thinkless

[Preprint 2025] Thinkless: LLM Learns When to Think

grpo大语言模型reinforcement-learning
Python 223
3 个月前
https://static.github-zh.com/github_avatars/ucla-mobility?size=40
ucla-mobility / AutoVLA

Official implementation of paper "AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning"

autonomous-drivinggrpo
209
3 个月前
https://static.github-zh.com/github_avatars/SunzeY?size=40
SunzeY / SEAgent

Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"

agentcomputer-use-agentgui-agentgrporlvllm
Python 187
1 个月前
loading...