#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4v, ...
#大语言模型#Solve Visual Understanding with Reinforced VLMs
#大语言模型#Skywork-R1V2:Multimodal Hybrid Reinforcement Learning for Reasoning
Explore the Multimodal “Aha Moment” on 2B Model
Collect every awesome work about r1!
#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
#大语言模型#The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1
Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
#大语言模型#Train a Language Model with GRPO to create a schedule from a list of events and priorities
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model
#大语言模型#[Preprint 2025] Thinkless: LLM Learns When to Think
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
Official repository for "RLVR-World: Training World Models with Reinforcement Learning", https://arxiv.org/abs/2505.13934
#大语言模型#R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning.