vLLM 是一个高效的开源库，用于加速大语言模型推理，通过优化内存管理和分布式处理实现高吞吐量和低延迟。

A high-throughput and memory-efficient inference and serving engine for LLMs

2023-02-09

否

该仓库已收录但尚未编辑。项目介绍及使用教程请前往 GitHub 阅读 README

登录后发表评论

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go3.95 k

7 天前

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python1.71 k

2 天前

#大语言模型#Community maintained hardware plugin for vLLM on Ascend

Python946

21 小时前

vllm-project / vllm

自述文件