mixture-of-experts · GitHub Topics

#计算机科学#DeepSpeed Chat: 一键式RLHF训练，让你的类ChatGPT千亿大模型提速省钱15倍

深度学习 PyTorch gpu 机器学习 billion-parameters data-parallelism model-parallelism inference pipeline-parallelism compression mixture-of-experts trillion-parameters zero

Python 39.55 k

14 小时前

codelion / optillm

#大语言模型#Optimizing inference proxy for LLMs

Python 2.69 k

4 天前

dvmazur / mixtral-offloading

#大语言模型#Run Mixtral-8x7B models in Colab or consumer desktops

colab-notebook 深度学习 google-colab language-model 大语言模型 mixture-of-experts offloading PyTorch quantization

Python 2.32 k

1 年前

learning-at-home / hivemind

#计算机科学#Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

深度学习 PyTorch volunteer-computing mixture-of-experts distributed-training distributed-systems asynchronous-programming asyncio dht neural-networks 机器学习

Python 2.23 k

12 天前

PKU-YuanGroup / MoE-LLaVA

【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models

large-vision-language-model mixture-of-experts moe multi-modal

Python 2.2 k

16 天前

davidmrau / mixture-of-experts

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

moe mixture-of-experts PyTorch

Python 1.15 k

1 年前

rhymes-ai / Aria

Codebase for Aria - an Open Multimodal Native MoE

mixture-of-experts multimodal vision-and-language

Jupyter Notebook 1.06 k

6 个月前

pjlab-sys4nlp / llama-moe

#大语言模型#⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

llama 大语言模型 mixture-of-experts moe

Python 977

8 个月前

microsoft / Tutel

#大语言模型#Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek/Kimi-K2/Qwen3 FP8/FP4

PyTorch moe mixture-of-experts deepseek 大语言模型

C 870

7 天前

lucidrains / mixture-of-experts

#计算机科学#A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

人工智能深度学习 transformer mixture-of-experts

Python 786

2 年前

SMTorg / smt

#计算机科学#Surrogate Modeling Toolbox

derivative sampling mixture-of-experts predictive-modeling 机器学习

Jupyter Notebook 783

3 天前

AviSoori1x / makeMoE

#大语言模型#From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)

large-language-models 大语言模型 mixture-of-experts 深度学习 neural-networks PyTorch pytorch-implementation

Jupyter Notebook 731

9 个月前

drawbridge / keras-mmoe

#计算机科学#A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)

机器学习深度学习数据科学深度神经网络 Keras Tensorflow multi-task-learning mixture-of-experts

Python 719

2 年前

ymcui / Chinese-Mixtral

#自然语言处理#中文Mixtral混合专家大模型（Chinese Mixtral MoE LLMs）

large-language-models 大语言模型 mixtral mixture-of-experts moe 自然语言处理

Python 603

1 年前

Leeroo-AI / mergoo

#自然语言处理#A library for easily merging multiple LLM experts, and efficiently train the merged LLM.

generative-ai 大语言模型 merge mixture-of-experts 自然语言处理 fine-tuning large-language-models lora 人工智能 transformers multi-model Open Source

Python 489

1 年前

lucidrains / st-moe-pytorch

#计算机科学#Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

人工智能深度学习 mixture-of-experts

Python 351

1 年前

lucidrains / soft-moe-pytorch

#计算机科学#Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch

人工智能深度学习 mixture-of-experts transformers

Python 307

4 个月前

Luodian / Generalizable-Mixture-of-Experts

#计算机科学#GMoE could be the next backbone model for many kinds of generalization task.

深度学习 domain-generalization PyTorch pytorch-implementation mixture-of-experts

Python 273

2 年前

SkyworkAI / MoH

MoH: Multi-Head Attention as Mixture-of-Head Attention

attention dit 大语言模型 mixture-of-experts moe transformer vit

Python 267

9 个月前

inferflow / inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

llama2 llamacpp llm-inference model-quantization multi-gpu-inference mixture-of-experts moe gemma falcon minicpm mistral bloom deepseek internlm baichuan2 mixtral qwen

C++ 245

1 年前