[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
[Paper][ACL 2024 Findings] Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering
DPO-Shift: Shifting the Distribution of Direct Preference Optimization
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
#大语言模型#Video Generation Benchmark
Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).
Code for "ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment"
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
[ICLR 2025] Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
[ICCV 2025] Official repository of "Mitigating Object Hallucinations via Sentence-Level Early Intervention".
#大语言模型#[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
[ICML 25] "Preference Optimization for Combinatorial Optimization Problems"
Survey of preference alignment algorithms
#大语言模型#Generate synthetic datasets for instruction tuning and preference alignment using tools like `distilabel` for efficient and scalable data creation.
#自然语言处理#Creating a GPT-2-Based Chatbot with Human Preferences