trainium · GitHub Topics

#大语言模型#vLLM 是一个高效的开源库，用于加速大语言模型推理，通过优化内存管理和分布式处理实现高吞吐量和低延迟。

gpt 大语言模型 PyTorch llmops mlops model-serving transformer llm-serving inference llama amd rocm CUDA inferentia trainium tpu xpu hpu deepseek qwen

Python 57.97 k

5 小时前

aws-samples / foundation-model-benchmarking-tool

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

benchmarking foundation-models inferentia llama2 sagemaker generative-ai benchmark bedrock llama3 trainium evaluation-metrics deepseek deepseek-r1

Jupyter Notebook 251

5 个月前

HelpingAI / inferno

A production-ready inference server supporting any AI model on all major hardware platforms (CPU, GPU, TPU, Apple Silicon). Inferno seamlessly deploys and serves language models from Hugging Face, loc...

llm-serving model-serving PyTorch trainium

Python 3

5 个月前