GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

post-training

Website
Wikipedia
https://static.github-zh.com/github_avatars/hao-ai-lab?size=40
hao-ai-lab / FastVideo

A unified inference and post-training framework for accelerated video generation.

diffusersdiffusion-modelsvideo-generationdistillationinferencepost-training
Python 2.22 k
14 小时前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / Awesome-LLM-Post-training

Awesome Reasoning LLM Tutorial/Survey/Guide

large-language-modelspost-trainingreasoningreinforcement-learningscaling
Python 2.06 k
2 个月前
https://static.github-zh.com/github_avatars/SmartFlowAI?size=40
SmartFlowAI / EmoLLM

#大语言模型#心理健康大模型 (LLM x Mental Health), Pre & Post-training & Dataset & Evaluation & Depoly & RAG, with InternLM / Qwen / Baichuan / DeepSeek / Mixtral / LLama / GLM series models

大语言模型datasetevaluationpost-training
Python 1.56 k
1 个月前
https://static.github-zh.com/github_avatars/turningpoint-ai?size=40
turningpoint-ai / VisualThinker-R1-Zero

Explore the Multimodal “Aha Moment” on 2B Model

multimodalpost-trainingreasoninggrporeinforcement-learningdeepseekdeepseek-r1deepseek-r1-zero
Python 608
6 个月前
https://static.github-zh.com/github_avatars/anakin87?size=40
anakin87 / qwen-scheduler-grpo

#大语言模型#Train a Language Model with GRPO to create a schedule from a list of events and priorities

fine-tuninggrpo大语言模型post-trainingreasoningreinforcement-learning
Jupyter Notebook 231
5 个月前
https://static.github-zh.com/github_avatars/GAIR-NLP?size=40
GAIR-NLP / OctoThinker

#大语言模型#Revisiting Mid-training in the Era of Reinforcement Learning Scaling

llama大语言模型post-trainingpre-trainingqwenreasoningrl
Jupyter Notebook 172
2 个月前
https://static.github-zh.com/github_avatars/yihedeng9?size=40
yihedeng9 / rlhf-summary-notes

#计算机科学#A brief and partial summary of RLHF algorithms.

深度学习large-language-modelspost-trainingreinforcement-learningrlhf
132
6 个月前
https://static.github-zh.com/github_avatars/AoqunJin?size=40
AoqunJin / Awesome-VLA-Post-Training

A collection of vision-language-action model post-training methods.

embodied-agentembodied-aifine-tuningpost-training
98
17 天前
https://static.github-zh.com/github_avatars/GAIR-NLP?size=40
GAIR-NLP / MegaScience

#大语言模型#MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

llama大语言模型post-trainingqwenreasoningscience
Python 97
2 个月前
https://static.github-zh.com/github_avatars/tiannuo-yang?size=40
tiannuo-yang / SearchAgent-X

#大语言模型#A High-Efficiency System of Large Language Model Based Search Agents

agent人工智能approximate-nearest-neighbor-searchinformation-retrieval大语言模型ragvllmllm-servingpost-trainingrlhf
Python 73
2 个月前
https://static.github-zh.com/github_avatars/ReinFlow?size=40
ReinFlow / ReinFlow

Flow RL. ReinFlow: Fine-tuning Flow Policy with Online RL (Reinforcement Learning).

rlRoboticsfine-tuningpost-traininghumanoidlocomotionmanipulationrobot-learningflow
Python 63
18 天前
https://static.github-zh.com/github_avatars/Jialuo-Li?size=40
Jialuo-Li / Science-T2I

[CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis

benchmark机器视觉datasetgenerative-modelpost-trainingscience
Python 59
5 个月前
https://static.github-zh.com/github_avatars/bobxwu?size=40
bobxwu / learning-from-rewards-llm-papers

#大语言模型#A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-...

large-language-models大语言模型post-trainingreinforcement-learning
56
3 个月前
https://static.github-zh.com/github_avatars/DolbyUUU?size=40
DolbyUUU / Logic-RL-Lite

#大语言模型#Lightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT ≠ Accuracy", and "Language Mixing in Instruct Models".

deepseekdeepseek-r1fine-tuning大语言模型post-trainingreinforcement-learning
Python 48
5 个月前
https://static.github-zh.com/github_avatars/keshik6?size=40
keshik6 / grafting

Exploring Diffusion Transformer Designs via Grafting

diffusion-modelsdiffusion-transformerimage-generationtext-to-image-generationlinear-attentionself-attentionpost-training
Jupyter Notebook 48
3 个月前
https://static.github-zh.com/github_avatars/taco-group?size=40
taco-group / Re-Align

#大语言模型#A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

alignmentdpohallucinationlarge-language-models大语言模型mllmmultimodal-large-language-modelspost-trainingpporlhfsafetyvision-language-modelvlm
Python 45
24 天前
https://static.github-zh.com/github_avatars/UIC-Liu-Lab?size=40
UIC-Liu-Lab / CPT

#自然语言处理#[EMNLP 2022] Continual Training of Language Models for Few-Shot Learning

continual-learningfew-shot-learninglanguage-modeling自然语言处理post-trainingtransformers
Python 45
3 年前
https://static.github-zh.com/github_avatars/DolbyUUU?size=40
DolbyUUU / DeepEnlighten

#大语言模型#Pure RL to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.

deepseekdeepseek-r1fine-tuning大语言模型post-trainingreinforcement-learning
Python 38
6 个月前
https://static.github-zh.com/github_avatars/complex-reasoning?size=40
complex-reasoning / RPG

#大语言模型#The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)

深度学习foundation-modelslarge-language-models大语言模型post-trainingreinforcement-learning
Python 36
6 天前
https://static.github-zh.com/github_avatars/sastpg?size=40
sastpg / RFTT

RFTT: Reasoning with Reinforced Functional Token Tuning

large-language-modelspost-trainingreasoningreinforcement-learningtree-search
Python 29
3 个月前
loading...