inference-server · GitHub Topics

#大语言模型#RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of con...

人工智能 containers CUDA hip inference-server intel llamacpp 大语言模型 podman vllm

Python 1.94 k

1 天前

roboflow / inference

#计算机科学#Turn any computer or edge device into a command center for your computer vision projects.

机器视觉 inference-api inference-server vit yolov5 yolov8 jetson tensorrt classification instance-segmentation object-detection onnx 部署 Docker inference 机器学习 Python yolo11 agents

Python 1.82 k

12 小时前

basetenlabs / truss

#计算机科学#The simplest way to serve AI/ML models in production

机器学习人工智能 easy-to-use inference-api inference-server model-serving Open Source packaging falcon stable-diffusion Whisper wizardlm

Python 1.03 k

3 天前

pipeless-ai / pipeless

#计算机科学#An open-source computer vision framework to build and deploy apps in minutes

Rust 760

1 年前

underneathall / pinferencia

#自然语言处理#Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

人工智能 inference-server predict inference 深度学习机器学习 Python serving model-deployment huggingface PyTorch Tensorflow transformers 数据科学 model-serving 机器视觉自然语言处理 paddlepaddle

Python 555

2 年前

NVIDIA / gpu-rest-engine

#计算机科学#A REST API for Caffe using Docker and Go

caffe gpu inference inference-server Docker 深度学习

C++ 419

7 年前

BMW-InnovationLab / BMW-YOLOv4-Inference-API-GPU

#计算机科学#This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

yolov3 inference gpu API 深度学习机器视觉 bounding-boxes inference-server Docker REST API yolo 神经网络 Dockerfile yolov4 无代码

Python 280

3 年前

containers / podman-desktop-extension-ai-lab

Work with LLMs on a local environment using containers

人工智能 containers inference-server 大语言模型 local podman

TypeScript 241

2 天前

BMW-InnovationLab / BMW-YOLOv4-Inference-API-CPU

#计算机科学#This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

yolov3 inference API cpu 深度学习机器视觉 OpenCV object-detection Docker 深度神经网络神经网络 REST API inference-server bounding-boxes yolov4 无代码

Python 220

3 年前

BMW-InnovationLab / BMW-TensorFlow-Inference-API-CPU

#计算机科学#This is a repository for an object detection inference API using the Tensorflow framework.

Tensorflow inference API cpu 深度学习 object-detection 机器视觉 Docker bounding-boxes Docker Image docker-ce inference-engine inference-server REST API

Python 183

3 年前

kibae / onnxruntime-server

#计算机科学#ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

人工智能机器学习 onnx onnxruntime 深度学习 inference-server nueral-networks CUDA contributions-welcome

C++ 165

2 个月前

autodeployai / ai-serving

Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints

onnx inference-server onnx-models inference

Scala 159

9 个月前

vertexclique / orkhon

#计算机科学#Orkhon: ML Inference Framework and Server Runtime

inference-server 机器学习 Python Tensorflow async multiprocessing data-parallelism

Rust 151

4 年前

kf5i / k3ai

K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.

kubeflow-pipelines Kubernetes k3s 机器学习 datascience 人工智能 Edge kubeflow inference-server

PowerShell 101

4 年前

notAI-tech / fastDeploy

#计算机科学#Deploy DL/ ML inference pipelines with minimal extra code.

深度学习 PyTorch serving falcon gevent Docker model-deployment model-serving http-server gunicorn triton-inference-server Python triton inference-server streaming-audio WebSocket

Python 98

8 个月前

RubixML / Server

#计算机科学#A standalone inference server for trained Rubix ML estimators.

机器学习 http-server infrastructure API model-deployment 微服务 JSON:API PHP REST API inference inference-engine ml-infrastructure inference-server

PHP 63

4 个月前

friendliai / friendli-client

#大语言模型#[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

generative-ai 大语言模型 llm-inference llmops serving gpt gpt3 inference llama2 llm-serving inference-engine inference-server 人工智能 llm-ops mistral 机器学习 mlops stable-diffusion

Python 48

1 个月前

curtisgray / wingman

#下载器#Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

人工智能聊天机器人 ChatGPT Linux llama llamacpp 大语言模型 local macOS Windows download 下载器 openai gpu gpu-acceleration gpu-monitoring inference inference-engine inference-server

TypeScript 43

1 年前

k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server -...

triton-inference-server tensorrt onnx PyTorch nvidia-docker inference-engine inference-server inference text-detection

Python 33

4 年前

haicheviet / fullstack-machine-learning-inference

#计算机科学#Fullstack machine learning inference template

Amazon Web Services cloudformation FastAPI full-stack inference-server Infrastructure as code 机器学习

Jupyter Notebook 30

2 年前