#

human-feedback

https://static.github-zh.com/github_avatars/lucidrains?size=40

#计算机科学#Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

Python 7.87 k
14 天前
https://static.github-zh.com/github_avatars/opendilab?size=40
4.14 k
4 天前
https://static.github-zh.com/github_avatars/conceptofmind?size=40
Python 472
2 年前
https://static.github-zh.com/github_avatars/huggingface?size=40
Jupyter Notebook 263
9 个月前
https://static.github-zh.com/github_avatars/yk7333?size=40

[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"

Python 239
1 年前
https://static.github-zh.com/github_avatars/wxjiao?size=40

#大语言模型#The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

Python 177
9 个月前
https://static.github-zh.com/github_avatars/xrsrke?size=40

#大语言模型#Implementation of Reinforcement Learning from Human Feedback (RLHF)

Jupyter Notebook 172
2 年前
https://static.github-zh.com/github_avatars/PKU-Alignment?size=40

#数据仓库#BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

Makefile 157
2 年前
https://static.github-zh.com/github_avatars/HannahKirk?size=40
Jupyter Notebook 79
1 年前
https://static.github-zh.com/github_avatars/JD-GenX?size=40
Python 57
10 个月前
https://static.github-zh.com/github_avatars/davidberenstein1957?size=40

Dataset Viber is your chill repo for data collection, annotation and vibe checks.

Python 46
1 年前
https://static.github-zh.com/github_avatars/gao-g?size=40

#大语言模型#Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".

Python 40
10 个月前
https://static.github-zh.com/github_avatars/ZiyiZhang27?size=40

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

Python 38
1 年前
https://static.github-zh.com/github_avatars/AlaaLab?size=40

[ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"

Python 19
2 年前
https://static.github-zh.com/github_avatars/victor-iyi?size=40

Reinforcement Learning from Human Feedback with 🤗 TRL

Python 9
2 年前
https://static.github-zh.com/github_avatars/RapidataAI?size=40
Python 6
2 个月前
https://static.github-zh.com/github_avatars/CogniSeeker?size=40

REactive Behavior Constraint-Aware Tree learning (REBCAT) - a human-robot collaboration framework to learn task from demonstrations. Interpretable, fast, object-centric, and reactive.

Python 2
4 个月前
https://static.github-zh.com/github_avatars/JacqueWill?size=40
JavaScript 1
2 年前
loading...
Website
Wikipedia