#大语言模型#Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
#大语言模型#OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
#大语言模型#☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
#大语言模型#gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR和TTS的开源框架。
#大语言模型#kvcached: Elastic KV cache for dynamic GPU sharing and efficient multi-LLM inference.
#大语言模型#Arks is a cloud-native inference framework running on Kubernetes
DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks
#自然语言处理#A guide to structured generation using constrained decoding
#大语言模型#Controllable Language Model Interactions in TypeScript
#大语言模型#Experiments with LLMs in clouds (powered by SGLang)
#大语言模型#The Private AI Setup Dream Guide for Demos automates the installation of the software needed for a local private AI setup, utilizing AI models (LLMs and diffusion models) for use cases such as general...
llmd is a LLMs daemonset, it provide model manager and get up and running large language models, it can use llama.cpp or vllm or sglang to running large language models.
#大语言模型#Bench360 is a modular benchmarking suite for local LLM deployments. It offers a full-stack, extensible pipeline to evaluate the latency, throughput, quality, and cost of LLM inference on consumer and ...