#大语言模型#Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech ge...
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
#大语言模型#OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
#大语言模型#☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
#大语言模型#gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。
#大语言模型#Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
#大语言模型#A workload for deploying LLM inference services on Kubernetes
#大语言模型#Arks is a cloud-native inference framework running on Kubernetes
A high-performance RDMA distributed storage system for fast LLM Inference and GPU Training
DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks
#大语言模型#Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
#自然语言处理#A guide to structured generation using constrained decoding
#大语言模型#Controllable Language Model Interactions in TypeScript
#大语言模型#Experiments with LLMs in clouds (powered by SGLang)