GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

grpo

Website
Wikipedia
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / ms-swift

#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4v, ...

大语言模型lorallamasftdeploymultimodalpeftinternvlligerqwen2-vlrftdeepseek-r1embeddinggrpoopen-r1megatronomnillama4qwen3qwen3-moe
Python 8.09 k
2 天前
https://static.github-zh.com/github_avatars/om-ai-lab?size=40
om-ai-lab / VLM-R1

#大语言模型#Solve Visual Understanding with Reinforced VLMs

deepseek-r1grpo大语言模型multimodalvlmqwenreinforcement-learning
Python 5.14 k
1 个月前
https://static.github-zh.com/github_avatars/SkyworkAI?size=40
SkyworkAI / Skywork-R1V

#大语言模型#Skywork-R1V2:Multimodal Hybrid Reinforcement Learning for Reasoning

deepseek-r1大语言模型reasoningvlmgrporeinforcement-learning
Python 2.62 k
6 天前
https://static.github-zh.com/github_avatars/turningpoint-ai?size=40
turningpoint-ai / VisualThinker-R1-Zero

Explore the Multimodal “Aha Moment” on 2B Model

multimodalreasoninggrporeinforcement-learningdeepseekdeepseek-r1deepseek-r1-zero
Python 592
3 个月前
https://static.github-zh.com/github_avatars/modelscope?size=40
modelscope / awesome-deep-reasoning

Collect every awesome work about r1!

collectiondeepseekgrpoo1qwenrlreasoning
Python 382
1 个月前
https://static.github-zh.com/github_avatars/sail-sg?size=40
sail-sg / oat

#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

alignmentdpo大语言模型rlhfdistributed-trainingreasoninggrpoppo
Python 376
5 天前
https://static.github-zh.com/github_avatars/jianzhnie?size=40
jianzhnie / Open-R1

#大语言模型#The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

大语言模型rlhfdeepseek-r1grpodeepseek-v3
Python 258
3 个月前
https://static.github-zh.com/github_avatars/hustvl?size=40
hustvl / AlphaDrive

Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

autonomous-drivinggrpoplanningreasoningreinforcement-learningvision-language-model
Python 235
3 个月前
https://static.github-zh.com/github_avatars/zhaochen0110?size=40
zhaochen0110 / OpenThinkIMG

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

grporeinforcement-learning
Jupyter Notebook 219
14 天前
https://static.github-zh.com/github_avatars/jiangxinke?size=40
jiangxinke / Agentic-RAG-R1

Agentic RAG R1 Framework via Reinforcement Learning

agenticgrporagrl
Python 204
23 天前
https://static.github-zh.com/github_avatars/anakin87?size=40
anakin87 / qwen-scheduler-grpo

#大语言模型#Train a Language Model with GRPO to create a schedule from a list of events and priorities

fine-tuninggrpo大语言模型reasoningreinforcement-learning
Jupyter Notebook 200
2 个月前
https://static.github-zh.com/github_avatars/bowang-lab?size=40
bowang-lab / BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

computational-biologydnareasoningBioinformaticsgrpolarge-language-modelsfoundation-models
Python 186
6 天前
https://static.github-zh.com/github_avatars/VainF?size=40
VainF / Thinkless

#大语言模型#[Preprint 2025] Thinkless: LLM Learns When to Think

adaptiveefficiencygrpo大语言模型reasoning
Python 140
11 天前
https://static.github-zh.com/github_avatars/ritzz-ai?size=40
ritzz-ai / GUI-R1

Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

deep-reinforcement-learninggui-agentlarge-multimodal-modelsmultimodalmultimodal-large-language-modelsgrpoo1
Python 110
1 个月前
https://static.github-zh.com/github_avatars/yihedeng9?size=40
yihedeng9 / OpenVLThinker

OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement

grporlvision-language-model
Python 89
1 个月前
https://static.github-zh.com/github_avatars/Goekdeniz-Guelmez?size=40
Goekdeniz-Guelmez / mlx-lm-lora

#计算机科学#Train Large Language Models on MLX.

Apple深度学习dpogrpo机器学习MLXsfttraining
Python 84
7 天前
https://static.github-zh.com/github_avatars/aim-uofa?size=40
aim-uofa / Active-o3

ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

grpoo3rl
59
17 天前
https://static.github-zh.com/github_avatars/aim-uofa?size=40
aim-uofa / Omni-R1

Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

grporl
Python 58
13 天前
https://static.github-zh.com/github_avatars/thuml?size=40
thuml / RLVR-World

Official repository for "RLVR-World: Training World Models with Reinforcement Learning", https://arxiv.org/abs/2505.13934

grpovideo-generationvideo-predictionweb-agentworld-model
Python 39
6 天前
https://static.github-zh.com/github_avatars/Wangbiao2?size=40
Wangbiao2 / R1-Track

#大语言模型#R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.

deep-reinforcement-learninggrpomllmmultimodalobject-tracking大语言模型single-object-tracking
Python 37
1 个月前
loading...