GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

grpo

Website
Wikipedia
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / ms-swift

#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava,...

大语言模型lorallamasftdeploymultimodalpeftinternvlligerqwen2-vlrftdeepseek-r1embeddinggrpoopen-r1megatronomnillama4qwen3qwen3-moe
Python 9 k
2 小时前
https://static.github-zh.com/github_avatars/om-ai-lab?size=40
om-ai-lab / VLM-R1

#大语言模型#Solve Visual Understanding with Reinforced VLMs

deepseek-r1grpo大语言模型multimodalvlmqwenreinforcement-learning
Python 5.38 k
1 个月前
OpenPipe/ART
https://static.github-zh.com/github_avatars/OpenPipe?size=40
OpenPipe / ART

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, Kimi, and more!

大语言模型lorareinforcement-learningagentagentic-aigrporlkimi-aiqwenqwen3
Python 3.97 k
1 小时前
https://static.github-zh.com/github_avatars/SkyworkAI?size=40
SkyworkAI / Skywork-R1V

#大语言模型#Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.

deepseek-r1大语言模型reasoningvlmgrporeinforcement-learning
Python 2.92 k
15 天前
https://static.github-zh.com/github_avatars/langfengQ?size=40
langfengQ / verl-agent

verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"

llm-agentsllm-trainingreinforcement-learninglarge-language-modelsdeepseek-r1grpoagent-framework
Python 647
10 天前
https://static.github-zh.com/github_avatars/turningpoint-ai?size=40
turningpoint-ai / VisualThinker-R1-Zero

Explore the Multimodal “Aha Moment” on 2B Model

multimodalreasoninggrporeinforcement-learningdeepseekdeepseek-r1deepseek-r1-zero
Python 603
4 个月前
https://static.github-zh.com/github_avatars/sail-sg?size=40
sail-sg / oat

#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

alignmentdpo大语言模型rlhfdistributed-trainingreasoninggrpoppo
Python 416
3 天前
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / awesome-deep-reasoning

Collect every awesome work about r1!

collectiondeepseekgrpoo1qwenrlreasoning
Python 399
3 个月前
https://static.github-zh.com/github_avatars/zhaochen0110?size=40
zhaochen0110 / OpenThinkIMG

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

grporeinforcement-learning
Jupyter Notebook 274
2 个月前
https://static.github-zh.com/github_avatars/jiangxinke?size=40
jiangxinke / Agentic-RAG-R1

Agentic RAG R1 Framework via Reinforcement Learning

agenticgrporagrl
Python 267
2 个月前
https://static.github-zh.com/github_avatars/jianzhnie?size=40
jianzhnie / Open-R1

#大语言模型#The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

大语言模型rlhfdeepseek-r1grpodeepseek-v3
Python 265
5 个月前
https://static.github-zh.com/github_avatars/bowang-lab?size=40
bowang-lab / BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

computational-biologydnareasoningBioinformaticsgrpolarge-language-modelsfoundation-models
Jupyter Notebook 262
2 个月前
https://static.github-zh.com/github_avatars/hustvl?size=40
hustvl / AlphaDrive

Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

autonomous-drivinggrpoplanningreasoningreinforcement-learningvision-language-model
Python 250
4 个月前
https://static.github-zh.com/github_avatars/anakin87?size=40
anakin87 / qwen-scheduler-grpo

#大语言模型#Train a Language Model with GRPO to create a schedule from a list of events and priorities

fine-tuninggrpo大语言模型reasoningreinforcement-learning
Jupyter Notebook 215
3 个月前
https://static.github-zh.com/github_avatars/VainF?size=40
VainF / Thinkless

[Preprint 2025] Thinkless: LLM Learns When to Think

grpo大语言模型reinforcement-learning
Python 213
1 个月前
https://static.github-zh.com/github_avatars/ucla-mobility?size=40
ucla-mobility / AutoVLA

Official implementation of paper "AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning"

autonomous-drivinggrpo
159
1 个月前
https://static.github-zh.com/github_avatars/ritzz-ai?size=40
ritzz-ai / GUI-R1

Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

deep-reinforcement-learninggui-agentlarge-multimodal-modelsmultimodalmultimodal-large-language-modelsgrpoo1
Python 152
3 个月前
https://static.github-zh.com/github_avatars/ZJU-REAL?size=40
ZJU-REAL / GUI-G2

A Gaussian dense reward framework for GUI grounding training

grpo
Python 117
9 天前
https://static.github-zh.com/github_avatars/yihedeng9?size=40
yihedeng9 / OpenVLThinker

OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement

grporlvision-language-model
Python 100
7 天前
https://static.github-zh.com/github_avatars/IDEA-Research?size=40
IDEA-Research / Rex-Thinker

Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning

mllmobject-detectionreferring-expression-comprehensiongrpo
Python 100
1 个月前
loading...