#大语言模型#Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
#大语言模型#Structured data extraction and instruction calling with ML, LLM and Vision LLM
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
#大语言模型#High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
#大语言模型#Simple, scalable AI model deployment on GPU clusters
#大语言模型#RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of con...
Model swapping for llama.cpp (or any local OpenAPI compatible server)
#大语言模型#🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...
#大语言模型#AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
#大语言模型#Evaluate your LLM's response with Prometheus and GPT4 💯
#大语言模型#Community maintained hardware plugin for vLLM on Ascend
#大语言模型#一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据...
#大语言模型#LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.