vLLM 是一个高效的开源库,用于加速大语言模型推理,通过优化内存管理和分布式处理实现高吞吐量和低延迟。
A high-throughput and memory-efficient inference and serving engine for LLMs
2023-02-09
否
2025-08-02T03:05:04Z
该仓库已收录但尚未编辑。项目介绍及使用教程请前往 GitHub 阅读 README
Cost-efficient and pluggable Infrastructure components for GenAI inference
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
#大语言模型#Community maintained hardware plugin for vLLM on Ascend