A unified inference and post-training framework for accelerated video generation.
Awesome Reasoning LLM Tutorial/Survey/Guide
#大语言模型#心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models
Explore the Multimodal “Aha Moment” on 2B Model
#大语言模型#Train a Language Model with GRPO to create a schedule from a list of events and priorities
#大语言模型#Revisiting Mid-training in the Era of Reinforcement Learning Scaling
#计算机科学#A brief and partial summary of RLHF algorithms.
A collection of vision-language-action model post-training methods.
#大语言模型#MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
#大语言模型#A High-Efficiency System of Large Language Model Based Search Agents
Flow RL. ReinFlow: Fine-tuning Flow Policy with Online RL (Reinforcement Learning).
[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis
#大语言模型#A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-...
#大语言模型#Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".
Exploring Diffusion Transformer Designs via Grafting
#大语言模型#A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
#自然语言处理#[EMNLP 2022] Continual Training of Language Models for Few-Shot Learning
#大语言模型#Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.
#大语言模型#The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)
RFTT: Reasoning with Reinforced Functional Token Tuning