dpo · GitHub Topics

oumi-ai / oumi

Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!

dpo evaluation fine-tuning inference llama 大语言模型 sft vlms

Python 8.34 k

11 小时前

PKU-Alignment / align-anything

Align Anything: Training All-modality Model with Feedback

large-language-models multimodal rlhf chameleon dpo vision-language-model

Jupyter Notebook 4.38 k

2 个月前

shibing624 / MedicalGPT

#大语言模型#MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

llama ChatGPT gpt 大语言模型 medical dpo

Python 4.01 k

22 天前

ContextualAI / HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment dpo ppo rlhf

Python 873

18 天前

zhaorw02 / DeepMesh

#大语言模型#[ICCV 2025] Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

3D aigc dpo generative-model 大语言模型 mesh mesh-generation Point cloud

Python 622

1 个月前

jianzhnie / LLamaTuner

#大语言模型#Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ChatGPT dpo llama3 mixtral ppo qlora qwen rlhf

Python 608

6 个月前

ukairia777 / tensorflow-nlp-tutorial

#自然语言处理#tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

Tensorflow 自然语言处理 question-answering named-entity-recognition bert-ner bert 大语言模型 dpo llama sft huggingface transformers lora trainer

Jupyter Notebook 551

1 个月前

sail-sg / oat

#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

alignment dpo 大语言模型 rlhf distributed-training reasoning grpo ppo

Python 416

3 天前

dvlab-research / Step-DPO

#大语言模型#Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

dpo 大语言模型数学 reasoning

Python 375

6 个月前

TUDB-Labs / mLoRA

#大语言模型#An Efficient "Factory" to Build Multiple LoRA Adapters

baichuan chatglm finetune llama llama2 大语言模型 lora peft gpu dpo rlhf

Python 330

6 个月前

armbues / SiLLM

#大语言模型#SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

apple-silicon dpo large-language-models 大语言模型 llm-inference llm-training lora MLX

Python 275

1 个月前

RockeyCoss / SPO

[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

diffusion-models dpo sdxl text-to-image text-to-image-generation

Python 240

4 个月前

YangLing0818 / IterComp

[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

dpo rlhf text-to-image

Python 193

5 个月前

TideDra / VL-RLHF

#大语言模型#A RLHF Infrastructure for Vision-Language Models

dpo 大语言模型 lmm mllm rlhf vlm

Python 179

9 个月前

argilla-io / notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

dpo fine-tuning Zephyr RTOS

Python 168

2 年前

anilca / NetTrader.Indicator

Technical anaysis library for .NET

bollinger-bands cmf dpo macd momentum pvt sar

C# 142

1 年前

AIDC-AI / CHATS

CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation (ICML2025)

dpo sdxl text-to-image

Python 137

7 天前

Goekdeniz-Guelmez / mlx-lm-lora

#计算机科学#Train Large Language Models on MLX.

Apple 深度学习 dpo 机器学习 training finetuning-llms rlhf supervised-machine-learning

Python 134

14 天前

codelion / pts

#大语言模型#Pivotal Token Search

dataset-generation dpo 大语言模型 llm-inference phi4 tokens

Python 117

16 天前

NiuTrans / Vision-LLM-Alignment

#大语言模型#This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

vision dpo 大语言模型 rlhf sft ppo alignment mllm multi-model llava

Python 110

1 个月前