#自然语言处理#Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
#大语言模型#SGLang is a fast serving framework for large language models and vision language models.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-L...
FlashInfer: Kernel Library for LLM Serving
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
#大语言模型#MoBA: Mixture of Block Attention for Long-Context LLMs
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
#大语言模型#⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
#大语言模型#Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek/Kimi-K2/Qwen3 FP8/FP4
#计算机科学#Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
#大语言模型#A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
#大语言模型#An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions. (D...
#自然语言处理#中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)
#自然语言处理#MindSpore online courses: Step into LLM
#安卓#Official LISTEN.moe Android app
MoH: Multi-Head Attention as Mixture-of-Head Attention
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).