GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

human-feedback

Website
Wikipedia
https://static.github-zh.com/github_avatars/lucidrains?size=40
lucidrains / PaLM-rlhf-pytorch

#计算机科学#Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

人工智能attention-mechanisms深度学习reinforcement-learningtransformershuman-feedback
Python 7.84 k
2 个月前
https://static.github-zh.com/github_avatars/opendilab?size=40
opendilab / awesome-RLHF

#计算机科学#A curated list of reinforcement learning with human feedback resources (continually updated)

深度学习deep-reinforcement-learninghuman-feedbackreinforcement-learningrlhflarge-language-models
3.98 k
2 个月前
https://static.github-zh.com/github_avatars/conceptofmind?size=40
conceptofmind / LaMDA-rlhf-pytorch

#计算机科学#Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.

attention-mechanism深度学习机器学习人工智能human-feedbackreinforcement-learningtransformers
Python 472
1 年前
https://static.github-zh.com/github_avatars/huggingface?size=40
huggingface / data-is-better-together

#数据仓库#Let's build better datasets, together!

community数据集human-feedback机器学习
Jupyter Notebook 259
6 个月前
https://static.github-zh.com/github_avatars/yk7333?size=40
yk7333 / d3po

[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"

diffusion-modelshuman-feedbackreinforcement-learning
Python 229
1 年前
https://static.github-zh.com/github_avatars/wxjiao?size=40
wxjiao / ParroT

#大语言模型#The ParroT framework to enhance and regulate the Translation Abilities during Chat based on open-sourced LLMs (e.g., LLaMA-7b, Bloomz-7b1-mt) and human written translation and evaluation data.

ChatGPTgpt-4llamamachine-translationhuman-feedbackinstruction-tuninglora
Python 175
6 个月前
https://static.github-zh.com/github_avatars/xrsrke?size=40
xrsrke / instructGOOSE

#大语言模型#Implementation of Reinforcement Learning from Human Feedback (RLHF)

reinforcement-learningrlhfChatGPThuman-feedback
Jupyter Notebook 172
2 年前
https://static.github-zh.com/github_avatars/trubrics?size=40
trubrics / trubrics-python

#大语言模型#Product analytics for AI Assistants

机器学习ml-monitoringmlopshuman-feedback大语言模型llmopsStreamlit
Python 153
21 天前
https://static.github-zh.com/github_avatars/PKU-Alignment?size=40
PKU-Alignment / beavertails

#数据仓库#BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

ai-safetyhuman-feedbacklanguage-model大语言模型rlhfsafetybeaver数据集gptllama
Makefile 143
2 年前
https://static.github-zh.com/github_avatars/HannahKirk?size=40
HannahKirk / prism-alignment

The Prism Alignment Project

alignmentdatasethuman-feedback
Jupyter Notebook 76
1 年前
https://static.github-zh.com/github_avatars/ZhenbangDu?size=40
ZhenbangDu / Reliable_AD

#数据仓库#[ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback

advertisingdiffusersdiffusiondiffusion-modelseccv2024human-feedbackimage-generationrlhf数据集
Python 49
7 个月前
https://static.github-zh.com/github_avatars/davidberenstein1957?size=40
davidberenstein1957 / dataset-viber

Dataset Viber is your chill repo for data collection, annotation and vibe checks.

data-collectiondata-qualityevaluationhuman-feedback
Python 47
9 个月前
https://static.github-zh.com/github_avatars/gao-g?size=40
gao-g / prelude

#大语言模型#Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".

alignmentgpt4human-feedbackinterpretability大语言模型transformers
Python 40
7 个月前
https://static.github-zh.com/github_avatars/ZiyiZhang27?size=40
ZiyiZhang27 / tdpo

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

alignmentdiffusion-modelshuman-feedbackreinforcement-learningrlhftext-to-imagestable-diffusion
Python 35
1 年前
https://static.github-zh.com/github_avatars/AlaaLab?size=40
AlaaLab / pathologist-in-the-loop

[ NeurIPS 2023 ] Official Codebase for "Aligning Synthetic Medical Images with Clinical Knowledge using Human Feedback"

human-feedbackrlhfsynthetic-data
Python 19
2 年前
https://static.github-zh.com/github_avatars/wang8740?size=40
wang8740 / MAP

#大语言模型#Documentation at

finetuninghuman-feedback大语言模型rlhf
Python 9
3 个月前
https://static.github-zh.com/github_avatars/victor-iyi?size=40
victor-iyi / rlhf-trl

Reinforcement Learning from Human Feedback with 🤗 TRL

human-feedbackreinforcment-learningrlhf
Python 9
2 年前
https://static.github-zh.com/github_avatars/CogniSeeker?size=40
CogniSeeker / REBCAT

REactive Behavior Constraint-Aware Tree learning (REBCAT) - a human-robot collaboration framework to learn task from demonstrations. Interpretable, fast, object-centric, and reactive.

behavior-treesdecision-tree-classifierhuman-feedbackinterpretable-ai
Python 1
18 天前
https://static.github-zh.com/github_avatars/JacqueWill?size=40
JacqueWill / SEO_HIF_JS

#计算机科学#Search Engine Optimization using Human Implicit Feedback

data-privacyedge-computinghuman-feedback机器学习seo-optimization
JavaScript 1
2 年前
https://static.github-zh.com/github_avatars/cluebbers?size=40
cluebbers / dpo-rlhf-paraphrase-types

#计算机科学#Enhancing paraphrase-type generation using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF), with large-scale HPC support. This project aligns model outputs t...

alignment深度学习human-feedbackreinforcement-learningtransformers
Jupyter Notebook 0
4 个月前
loading...