post-training · GitHub Topics

hao-ai-lab / FastVideo

A unified inference and post-training framework for accelerated video generation.

diffusers diffusion-models video-generation distillation inference post-training

Python 2.22 k

14 小时前

mbzuai-oryx / Awesome-LLM-Post-training

Awesome Reasoning LLM Tutorial/Survey/Guide

large-language-models post-training reasoning reinforcement-learning scaling

Python 2.06 k

2 个月前

SmartFlowAI / EmoLLM

#大语言模型#心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models

大语言模型 dataset evaluation post-training

Python 1.56 k

1 个月前

turningpoint-ai / VisualThinker-R1-Zero

Explore the Multimodal “Aha Moment” on 2B Model

multimodal post-training reasoning grpo reinforcement-learning deepseek deepseek-r1 deepseek-r1-zero

Python 608

6 个月前

anakin87 / qwen-scheduler-grpo

#大语言模型#Train a Language Model with GRPO to create a schedule from a list of events and priorities

fine-tuning grpo 大语言模型 post-training reasoning reinforcement-learning

Jupyter Notebook 231

5 个月前

GAIR-NLP / OctoThinker

#大语言模型#Revisiting Mid-training in the Era of Reinforcement Learning Scaling

llama 大语言模型 post-training pre-training qwen reasoning rl

Jupyter Notebook 172

2 个月前

yihedeng9 / rlhf-summary-notes

#计算机科学#A brief and partial summary of RLHF algorithms.

深度学习 large-language-models post-training reinforcement-learning rlhf

132

6 个月前

AoqunJin / Awesome-VLA-Post-Training

A collection of vision-language-action model post-training methods.

embodied-agent embodied-ai fine-tuning post-training

17 天前

GAIR-NLP / MegaScience

#大语言模型#MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

llama 大语言模型 post-training qwen reasoning science

Python 97

2 个月前

tiannuo-yang / SearchAgent-X

#大语言模型#A High-Efficiency System of Large Language Model Based Search Agents

agent 人工智能 approximate-nearest-neighbor-search information-retrieval 大语言模型 rag vllm llm-serving post-training rlhf

Python 73

2 个月前

ReinFlow / ReinFlow

Flow RL. ReinFlow: Fine-tuning Flow Policy with Online RL (Reinforcement Learning).

rl Robotics fine-tuning post-training humanoid locomotion manipulation robot-learning flow

Python 63

18 天前

Jialuo-Li / Science-T2I

[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis

benchmark 机器视觉 dataset generative-model post-training science

Python 59

5 个月前

bobxwu / learning-from-rewards-llm-papers

#大语言模型#A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-...

large-language-models 大语言模型 post-training reinforcement-learning

3 个月前

DolbyUUU / Logic-RL-Lite

#大语言模型#Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".

deepseek deepseek-r1 fine-tuning 大语言模型 post-training reinforcement-learning

Python 48

5 个月前