vllm · GitHub Topics

#大语言模型#Llama 2 微调/推理方法和示例

人工智能 finetuning langchain llama llama2 大语言模型机器学习 Python PyTorch vllm

Jupyter Notebook 17.7 k

4 小时前

#大语言模型#Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...

ggml PyTorch chatglm 部署 flan-t5 大语言模型 wizardlm 人工智能机器学习 Whisper inference openai-api mistral gemma llama llamacpp vllm qwen llama3 glm4

Python 8.31 k

5 小时前

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

transformers vllm large-language-models raylib reinforcement-learning-from-human-feedback reinforcement-learning openai-o1 proximal-policy-optimization

Python 7.54 k

7 小时前

katanaml / sparrow

#大语言模型#Structured data extraction and instruction calling with ML, LLM and Vision LLM

机器学习 huggingface-transformers 自然语言处理机器视觉 gpt 大语言模型 rag vllm

Python 4.93 k

1 个月前

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

flash-attention tensorrt-llm vllm llm-inference deepseek deepseek-v3 deepseek-r1 qwen3

Python 4.32 k

1 天前

LMCache / LMCache

#大语言模型#Supercharge Your LLM with the Fastest KV Cache Layer

amd CUDA inference kv-cache 大语言模型 PyTorch rocm vllm fast speed

Python 3.72 k

3 小时前

PaddlePaddle / FastDeploy

#大语言模型#High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

serving ernie 大语言模型 inference llm-serving openai vllm ernie-45 ernie-45-vl

Python 3.44 k

1 天前

gpustack / gpustack

#大语言模型#Simple, scalable AI model deployment on GPU clusters

ascend CUDA deepseek distributed-inference genai inference llama llamacpp 大语言模型 maas metal openai qwen rocm vllm mindie llm-inference llm-serving local-ai heterogeneous-cluster

Python 3.17 k

1 天前

containers / ramalama

#大语言模型#RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of con...

人工智能 containers CUDA hip inference-server intel llamacpp 大语言模型 podman vllm

Python 1.94 k

1 天前

mostlygeek / llama-swap

Model swapping for llama.cpp (or any local OpenAPI compatible server)

Go llama llamacpp localllama localllm openai openai-api vllm

Go 1.09 k

4 天前

bricks-cloud / BricksLLM

#大语言模型#🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...

Go 大语言模型 openai 人工智能 anthropic Azure gpt PostgreSQL REST API ycombinator API Docker 隐私安全 generative-ai Open Source 自托管 vllm

Go 1.07 k

7 个月前

substratusai / kubeai

#大语言模型#AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

Kubernetes 大语言模型 openai-api autoscaler ollama vllm ollama-operator vllm-operator 人工智能 Whisper faster-whisper

Go 1.03 k

10 天前

prometheus-eval / prometheus-eval

#大语言模型#Evaluate your LLM's response with Prometheus and GPT4 💯

evaluation litellm 大语言模型 llmops Python vllm gpt4 llm-as-a-judge

Python 976

3 个月前

vllm-project / vllm-ascend

#大语言模型#Community maintained hardware plugin for vLLM on Ascend

ascend inference 大语言模型 llm-serving llmops mlops model-serving transformer vllm

Python 942

3 小时前

apconw / sanic-web

#大语言模型#一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目，采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据...