preference-alignment · GitHub Topics

princeton-nlp / SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

alignment large-language-models preference-alignment rlhf

Python 912

6 个月前

zjukg / KnowPAT

[Paper][ACL 2024 Findings] Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering

knowledge-graph large-language-models question-answering preference-alignment instruction-tuning

Python 194

1 年前

Meaquadddd / DPO-Shift

DPO-Shift: Shifting the Distribution of Direct Preference Optimization

alignment large-language-models preference-alignment rlhf

Python 60

5 个月前

junkangwu / beta-DPO

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

alignment dpo preference-alignment rlhf

Python 45

9 个月前

Video-Bench / Video-Bench

#大语言模型#Video Generation Benchmark

large-language-models multimodal-large-language-models preference-alignment sora video-generation video-understanding 大语言模型 text-to-video

Python 42

2 个月前

Shentao-YANG / Dense_Reward_T2I

Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).

preference-alignment text-to-image-generation

Python 39

1 年前

GradientSpaces / respace

Code for "ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment"

large-language-models preference-alignment

Python 34

2 个月前

junkangwu / Dr_DPO

[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"

alignment dpo preference-alignment rlhf

Python 16

1 年前

YJiangcm / BMC

[ICLR 2025] Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

alignment dpo rlhf 大语言模型 preference-alignment

Python 12

6 个月前

pspdada / SENTINEL

[ICCV 2025] Official repository of "Mitigating Object Hallucinations via Sentence-Level Early Intervention".

multimodal-datasets multimodal-large-language-models preference-alignment image-captioning

Python 8

2 天前

dvlab-research / TGDPO

#大语言模型#[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

alignment large-language-models 大语言模型 preference-alignment rlhf

Python 5

17 天前

MingjunPan / PO4COPs

[ICML 25] "Preference Optimization for Combinatorial Optimization Problems"

combinatorial-optimization preference-alignment reinforcement-learning

Python 4

2 个月前

BARUDA-AI / Awesome-Preference-Optimization

Survey of preference alignment algorithms

alignment preference-alignment rlhf

1 年前

thibaud-perrin / synthetic-datasets

#大语言模型#Generate synthetic datasets for instruction tuning and preference alignment using tools like `distilabel` for efficient and scalable data creation.

人工智能 instruction-tuning 大语言模型 preference-alignment synthetic-data

Jupyter Notebook 0

6 个月前

reshalfahsi / gpt2chat

#自然语言处理#Creating a GPT-2-Based Chatbot with Human Preferences

聊天机器人 gpt-2 huggingface instruction-tuning langchain preference-alignment PyTorch pytorch-lightning language-model 自然语言处理

Jupyter Notebook 0

3 个月前