inferentia · GitHub Topics

#大语言模型#vLLM 是一个高效的开源库，用于加速大语言模型推理，通过优化内存管理和分布式处理实现高吞吐量和低延迟。

gpt 大语言模型 PyTorch llmops mlops model-serving transformer llm-serving inference llama amd rocm CUDA inferentia trainium tpu xpu hpu deepseek qwen

Python 57.97 k

5 小时前

aphrodite-engine / aphrodite-engine

#计算机科学#Large-scale LLM inference engine

API inference-engine 机器学习 CUDA inferentia rocm intel lora speculative-decoding tpu

C++ 1.55 k

2 天前

aws-samples / foundation-model-benchmarking-tool

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

benchmarking foundation-models inferentia llama2 sagemaker generative-ai benchmark bedrock llama3 trainium evaluation-metrics deepseek deepseek-r1

Jupyter Notebook 251

5 个月前

aws-solutions-library-samples / guidance-for-machine-learning-inference-on-aws

This Guidance demonstrates how to deploy a machine learning inference architecture on Amazon Elastic Kubernetes Service (Amazon EKS). It addresses the basic implementation requirements as well as ways...

inferentia 机器学习

Shell 46

4 个月前

aws-samples / aws-inferentia-huggingface-workshop

#自然语言处理#CMP314 Optimizing NLP models with Amazon EC2 Inf1 instances in Amazon Sagemaker

自然语言处理 sagemaker inferentia

Jupyter Notebook 14

2 年前

aws-samples / awsome-fmops

Collection of bet practices, reference architectures, examples, and utilities for foundation model development and deployment on AWS.

eks generative-ai gpu inferentia kserve Kubernetes llm-inference Terraform llm-training PyTorch

HCL 13

2 个月前

daekeun-ml / aws-inferentia

This repository provides an easy hands-on way to get started with AWS Inferentia. A demonstration of this hands-on can be seen in the AWS Innovate 2023 - AIML Edition session.

inferentia

Jupyter Notebook 8

3 年前