#

proximal-policy-optimization

https://static.github-zh.com/github_avatars/OpenRLHF?size=40

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 7.93 k
20 小时前
https://static.github-zh.com/github_avatars/vwxyzjn?size=40
Python 7.86 k
2 个月前
https://static.github-zh.com/github_avatars/ikostrikov?size=40

#计算机科学#PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) ...

Python 3.83 k
3 年前
https://static.github-zh.com/github_avatars/Khrylx?size=40

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.

Python 1.25 k
5 年前
https://static.github-zh.com/github_avatars/TianhongDai?size=40

#算法刷题#This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are st...

Python 683
5 年前
https://static.github-zh.com/github_avatars/miroblog?size=40
Python 255
3 年前
https://static.github-zh.com/github_avatars/lcswillems?size=40
Python 205
3 年前
https://static.github-zh.com/github_avatars/VachanVY?size=40

PyTorch implementations of algorithms from "Reinforcement Learning: An Introduction by Sutton and Barto", along with various RL research papers.

Python 179
1 个月前
https://static.github-zh.com/github_avatars/adik993?size=40
Python 147
7 年前
loading...
Website
Wikipedia