GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

dpo

Website
Wikipedia
https://static.github-zh.com/github_avatars/oumi-ai?size=40
oumi-ai / oumi

Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!

dpoevaluationfine-tuninginferencellama大语言模型sftvlms
Python 8.18 k
2 天前
https://static.github-zh.com/github_avatars/shibing624?size=40
shibing624 / MedicalGPT

#大语言模型#MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

llamaChatGPTgpt大语言模型medicaldpo
Python 3.94 k
16 天前
https://static.github-zh.com/github_avatars/PKU-Alignment?size=40
PKU-Alignment / align-anything

Align Anything: Training All-modality Model with Feedback

large-language-modelsmultimodalrlhfchameleondpovision-language-model
Jupyter Notebook 3.93 k
19 天前
https://static.github-zh.com/github_avatars/ContextualAI?size=40
ContextualAI / HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignmentdpopporlhf
Python 855
8 天前
https://static.github-zh.com/github_avatars/jianzhnie?size=40
jianzhnie / LLamaTuner

#大语言模型#Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llamaChatGPTdpollama3mixtralppoqloraqwenrlhf
Python 606
5 个月前
https://static.github-zh.com/github_avatars/zhaorw02?size=40
zhaorw02 / DeepMesh

#大语言模型#Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

3Daigcdpogenerative-model大语言模型meshmesh-generationPoint cloud
Python 587
6 天前
https://static.github-zh.com/github_avatars/ukairia777?size=40
ukairia777 / tensorflow-nlp-tutorial

#自然语言处理#tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

Tensorflow自然语言处理question-answeringnamed-entity-recognitionbert-nerbert大语言模型dpollamasfthuggingfacetransformersloratrainer
Jupyter Notebook 544
1 个月前
https://static.github-zh.com/github_avatars/sail-sg?size=40
sail-sg / oat

#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

alignmentdpo大语言模型rlhfdistributed-trainingreasoninggrpoppo
Python 376
5 天前
https://static.github-zh.com/github_avatars/dvlab-research?size=40
dvlab-research / Step-DPO

#大语言模型#Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

dpo大语言模型数学reasoning
Python 369
5 个月前
https://static.github-zh.com/github_avatars/TUDB-Labs?size=40
TUDB-Labs / mLoRA

#大语言模型#An Efficient "Factory" to Build Multiple LoRA Adapters

baichuanchatglmfinetunellamallama2大语言模型lorapeftgpudporlhf
Python 322
4 个月前
https://static.github-zh.com/github_avatars/armbues?size=40
armbues / SiLLM

#大语言模型#SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

apple-silicondpolarge-language-models大语言模型llm-inferencellm-trainingloraMLX
Python 271
21 天前
https://static.github-zh.com/github_avatars/RockeyCoss?size=40
RockeyCoss / SPO

[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

diffusion-modelsdposdxltext-to-imagetext-to-image-generation
Python 216
2 个月前
https://static.github-zh.com/github_avatars/YangLing0818?size=40
YangLing0818 / IterComp

[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

dporlhftext-to-image
Python 189
4 个月前
https://static.github-zh.com/github_avatars/TideDra?size=40
TideDra / VL-RLHF

#大语言模型#A RLHF Infrastructure for Vision-Language Models

dpo大语言模型lmmmllmrlhfvlm
Python 176
7 个月前
https://static.github-zh.com/github_avatars/argilla-io?size=40
argilla-io / notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

dpofine-tuningZephyr RTOS
Python 168
1 年前
https://static.github-zh.com/github_avatars/anilca?size=40
anilca / NetTrader.Indicator

Technical anaysis library for .NET

bollinger-bandscmfdpomacdmomentumpvtsar
C# 142
9 个月前
https://static.github-zh.com/github_avatars/NiuTrans?size=40
NiuTrans / Vision-LLM-Alignment

#大语言模型#This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

visiondpo大语言模型rlhfsftppoalignmentmllmmulti-modelllava
Python 109
8 个月前
https://static.github-zh.com/github_avatars/codelion?size=40
codelion / pts

#大语言模型#Pivotal Token Search

dpo大语言模型llm-inferencephi4tokens
Python 101
1 个月前
https://static.github-zh.com/github_avatars/Goekdeniz-Guelmez?size=40
Goekdeniz-Guelmez / mlx-lm-lora

#计算机科学#Train Large Language Models on MLX.

Apple深度学习dpogrpo机器学习MLXsfttraining
Python 84
7 天前
https://static.github-zh.com/github_avatars/martin-wey?size=40
martin-wey / CodeUltraFeedback

CodeUltraFeedback: aligning large language models to coding preferences

alignmentcode-generationdpolarge-language-modelsllm-as-a-judge
Python 71
1 年前
loading...