human-feedback

#计算机科学#Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

人工智能 attention-mechanisms 深度学习 reinforcement-learning transformers human-feedback

Python 7.87 k

14 天前

opendilab / awesome-RLHF

#计算机科学#A curated list of reinforcement learning with human feedback resources (continually updated)

深度学习 deep-reinforcement-learning human-feedback reinforcement-learning rlhf large-language-models

4.14 k

4 天前

conceptofmind / LaMDA-rlhf-pytorch

#计算机科学#Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.

attention-mechanism 深度学习机器学习人工智能 human-feedback reinforcement-learning transformers

Python 472

2 年前

huggingface / data-is-better-together

#数据仓库#Let's build better datasets, together!

community 数据集 human-feedback 机器学习

Jupyter Notebook 263

9 个月前

yk7333 / d3po

[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"

diffusion-models human-feedback reinforcement-learning

Python 239

1 年前

wxjiao / ParroT

#大语言模型#The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

ChatGPT gpt-4 llama machine-translation human-feedback instruction-tuning lora

Python 177

9 个月前

xrsrke / instructGOOSE

#大语言模型#Implementation of Reinforcement Learning from Human Feedback (RLHF)

reinforcement-learning rlhf ChatGPT human-feedback

Jupyter Notebook 172

2 年前

PKU-Alignment / beavertails

#数据仓库#BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

ai-safety human-feedback language-model 大语言模型 rlhf safety beaver 数据集 gpt llama

Makefile 157

2 年前

trubrics / trubrics-python

#大语言模型#Product analytics for AI Assistants

机器学习 ml-monitoring mlops human-feedback 大语言模型 llmops Streamlit

Python 154

4 个月前

HannahKirk / prism-alignment

The Prism Alignment Project

alignment dataset human-feedback

Jupyter Notebook 79

1 年前

JD-GenX / Reliable_AD

#数据仓库#[ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback

advertising diffusers diffusion diffusion-models eccv2024 human-feedback image-generation rlhf 数据集

Python 57

10 个月前

davidberenstein1957 / dataset-viber

Dataset Viber is your chill repo for data collection, annotation and vibe checks.

data-collection data-quality evaluation human-feedback

Python 46

1 年前

gao-g / prelude

#大语言模型#Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".

alignment gpt4 human-feedback interpretability 大语言模型 transformers

Python 40

10 个月前

ZiyiZhang27 / tdpo

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

alignment diffusion-models human-feedback reinforcement-learning rlhf text-to-image stable-diffusion

Python 38

1 年前

AlaaLab / pathologist-in-the-loop

[ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"

human-feedback rlhf synthetic-data

Python 19

2 年前

wang8740 / MAP

#大语言模型#Documentation at

finetuning human-feedback 大语言模型 rlhf

Python 11

6 个月前

victor-iyi / rlhf-trl

Reinforcement Learning from Human Feedback with 🤗 TRL

human-feedback reinforcment-learning rlhf

Python 9

2 年前

RapidataAI / crowd-eval

#计算机科学#Break out of the AI training bubble

human-feedback 机器学习 wandb

Python 6

2 个月前

CogniSeeker / REBCAT

REactive Behavior Constraint-Aware Tree learning (REBCAT) - a human-robot collaboration framework to learn task from demonstrations. Interpretable, fast, object-centric, and reactive.

behavior-trees decision-tree-classifier human-feedback interpretable-ai

Python 2

4 个月前

JacqueWill / SEO_HIF_JS

#计算机科学#Search Engine Optimization using Human Implicit Feedback

data-privacy edge-computing human-feedback 机器学习 seo-optimization

JavaScript 1

2 年前

Website
Wikipedia