GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

reinforcement-learning-from-human-feedback

Website
Wikipedia
https://static.github-zh.com/github_avatars/OpenRLHF?size=40
OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agent RL)

transformersvllmlarge-language-modelsraylibreinforcement-learning-from-human-feedbackreinforcement-learningopenai-o1proximal-policy-optimization
Python 7.08 k
2 天前
https://static.github-zh.com/github_avatars/PKU-Alignment?size=40
PKU-Alignment / safe-rlhf

#数据仓库#Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

ai-safetyalpaca数据集deepspeedlarge-language-modelsllama大语言模型reinforcement-learningreinforcement-learning-from-human-feedbackrlhftransformersvicunasafetygpttransformerbeaver
Python 1.49 k
1 年前
https://static.github-zh.com/github_avatars/tatsu-lab?size=40
tatsu-lab / alpaca_farm

#自然语言处理#A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.

深度学习instruction-followinglarge-language-modelsreinforcement-learning-from-human-feedback自然语言处理
Python 813
1 年前
https://static.github-zh.com/github_avatars/openpsi-project?size=40
openpsi-project / ReaLHF

#大语言模型#Super-Efficient RLHF Training of LLMs with Parameter Reallocation

大语言模型llm-trainingreinforcement-learning-from-human-feedbackreinforcement-learningdistributed-systemsdistributed-computinglarge-language-modelsllm-frameworkdeepspeedtransformers
Python 300
2 个月前
https://static.github-zh.com/github_avatars/nlp-uoregon?size=40
nlp-uoregon / Okapi

#自然语言处理#Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

bloom聊天机器人datasetinstruction-tuninglanguage-modellarge-language-modelsmultilingual自然语言处理question-answeringreinforcement-learningreinforcement-learning-from-human-feedbackrlhfllama
Python 96
2 年前
https://static.github-zh.com/github_avatars/tlc4418?size=40
tlc4418 / llm_optimization

#计算机科学#A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

深度学习large-language-modelsreinforcement-learning-from-human-feedback
Python 43
5 个月前
https://static.github-zh.com/github_avatars/liushunyu?size=40
liushunyu / awesome-direct-preference-optimization

#大语言模型#A Survey of Direct Preference Optimization (DPO)

alignment大语言模型large-language-modelsreinforcement-learning-from-human-feedbackdpo代码审查survey
41
3 个月前
https://static.github-zh.com/github_avatars/CJReinforce?size=40
CJReinforce / RIME_ICML2024

#计算机科学#Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)

人工智能深度学习reinforcement-learningreinforcement-learning-from-human-feedbacklocomotionmanipulationRobotics
Python 29
8 个月前
https://static.github-zh.com/github_avatars/clam004?size=40
clam004 / minichatgpt

#自然语言处理#annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation

深度学习deep-reinforcement-learningfine-tuninglanguage-modellarge-language-models自然语言处理PyTorchreinforcement-learningtransformersreinforcement-learning-from-human-feedback
Jupyter Notebook 20
2 个月前
https://static.github-zh.com/github_avatars/XplainMind?size=40
XplainMind / LLMindCraft

Shaping Language Models with Cognitive Insights

Dockerinstruct-tuninglarge-language-modelspretrainingreinforcement-learning-from-human-feedbackdeepspeedtransformers
Python 13
1 年前
https://static.github-zh.com/github_avatars/ymetz?size=40
ymetz / rlhfblender

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback

experimentationPythonReactreinforcement-learningreinforcement-learning-from-human-feedback
Python 12
13 天前
https://static.github-zh.com/github_avatars/flint-xf-fan?size=40
flint-xf-fan / Federated-RLHF

[AAMAS 2025] Privacy-preserving and Personalized RLHF, with convergence guarantees. The Code contains experiments for training multiple instances of GPT-2 for personalized sentiment aligned text gener...

大语言模型reinforcement-learning-from-human-feedbackrftrlhf
Python 9
2 个月前
https://static.github-zh.com/github_avatars/liushunyu?size=40
liushunyu / Ask-AC

[TSMC] Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework

reinforcement-learningreinforcement-learning-from-human-feedback
Python 8
1 年前
https://static.github-zh.com/github_avatars/rosinality?size=40
rosinality / halite

Acceleration framework for Human Alignment Learning

evaluation-frameworkinferencelarge-language-modelsproximal-policy-optimizationreinforcement-learningreinforcement-learning-from-human-feedbacktransformers
Python 6
12 天前
https://static.github-zh.com/github_avatars/SJ9VRF?size=40
SJ9VRF / Reinforcement-Learning-for-Human-Feedback-RLHF

This repository contains the implementation of a Reinforcement Learning with Human Feedback (RLHF) system using custom datasets. The project utilizes the trlX library for training a preference model t...

language-model大语言模型reinforcement-learning-from-human-feedbackrlhf
Python 4
10 个月前
https://static.github-zh.com/github_avatars/Almost-Intelligence?size=40
Almost-Intelligence / LMRax

LMRax is a framework built on JAX to train transformers language models by reinforcement learning, along with the reward model training.

jaxlanguage-modelreinforcement-learningreinforcement-learning-from-human-feedbacktransformer
Python 2
2 年前
https://static.github-zh.com/github_avatars/Chinmaya-Kausik?size=40
Chinmaya-Kausik / RLHF-comparison

#大语言模型#Comparing various RLHF methods

dpo大语言模型pporeinforcement-learningreinforcement-learning-from-human-feedbackrlhftransformertransformers
Jupyter Notebook 0
9 个月前
https://static.github-zh.com/github_avatars/umenzi?size=40
umenzi / diversity-rlhf

Code for Bachelor thesis, The Human Factor: Addressing Diversity in Reinforcement Learning from Human Feedback.

reinforcement-learningreinforcement-learning-from-human-feedbackrlhf
Python 0
10 个月前
https://static.github-zh.com/github_avatars/ymnseol?size=40
ymnseol / weekly-paper-reading-group

#自然语言处理#Summaries of papers related to the alignment problem in NLP

instruction-tuning自然语言处理reinforcement-learning-from-human-feedbackrlhf
0
2 年前
https://static.github-zh.com/github_avatars/satyampurwar?size=40
satyampurwar / large-language-models

Unlocking the Power of Generative AI: In-Context Learning, Instruction Fine-Tuning and Reinforcement Learning Fine-Tuning.

bertconda-environmentflan-t5generative-ailarge-language-modelsmemory-managementprompt-engineeringproximal-policy-optimizationreinforcement-learning-from-human-feedbackstorage-managementencoder-decoder-modelmodel-quantization
Jupyter Notebook 0
8 个月前
loading...