rocm · GitHub Topics

#大语言模型#vLLM 是一个高效的开源库，用于加速大语言模型推理，通过优化内存管理和分布式处理实现高吞吐量和低延迟。

gpt 大语言模型 PyTorch llmops mlops model-serving transformer llm-serving inference llama amd rocm CUDA inferentia trainium tpu xpu hpu deepseek qwen

Python 53.63 k

2 小时前

apache / tvm

#计算机科学#Open deep learning compiler stack for cpu, gpu and specialized accelerators

编译器 tensor 深度学习 gpu opencl metal performance JavaScript rocm tvm vulkan spirv 机器学习

Python 12.49 k

13 小时前

tracel-ai / burn

#计算机科学#burn 是一个Rust的深度学习框架

autodiff 深度学习机器学习 Rust scientific-computing ndarray tensor 神经网络 PyTorch cross-platform kernel-fusion onnx WebAssembly webgpu CUDA metal rocm vulkan

Rust 12.4 k

7 小时前

cupy / cupy

NumPy & SciPy for GPU

CUDA cudnn cublas cusolver nccl Python NumPy cupy curand cusparse gpu SciPy tensor rocm

Python 10.38 k

2 天前

LMCache / LMCache

#大语言模型#Supercharge Your LLM with the Fastest KV Cache Layer

amd CUDA inference kv-cache 大语言模型 PyTorch rocm vllm fast speed

Python 3.72 k

3 小时前

gpustack / gpustack

#大语言模型#Simple, scalable AI model deployment on GPU clusters

ascend CUDA deepseek distributed-inference genai inference llama llamacpp 大语言模型 maas metal openai qwen rocm vllm mindie llm-inference llm-serving local-ai heterogeneous-cluster

Python 3.17 k

1 天前

deepmodeling / deepmd-kit

#计算机科学#A deep learning package for many-body potential energy representation and molecular dynamics

深度学习 Molecular Dynamics deepmd lammps potential-energy Python Tensorflow C++CUDA rocm computational-chemistry materials-science C Node.js PyTorch jax paddle

Python 1.72 k

2 天前

dmlc / nnvm

#计算机科学#

computation-graph 深度学习 optimization 部署 nnvm tvm CUDA opencl rocm metal

C++ 1.66 k

7 年前

aphrodite-engine / aphrodite-engine

#计算机科学#Large-scale LLM inference engine

API inference-engine 机器学习 CUDA inferentia rocm intel lora speculative-decoding tpu

C++ 1.49 k

9 天前

stotko / stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU

gpu gpu-computing gpu-acceleration gpgpu 数据结构 stl stl-like stl-containers C++modern-cpp CUDA openmp rocm hip

C++ 1.23 k

3 个月前

ROCm / ROCm-docker

Dockerfiles for the various software layers defined in the ROCm software platform

rocm Docker

Shell 482

8 天前

devnen / Chatterbox-TTS-Server

Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale te...

人工智能 API audio-generation CUDA FastAPI huggingface openai-api Python PyTorch speech-synthesis text-to-speech tts tts-api 声音克隆 web-ui rocm

Python 408

17 天前

alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration 🦙

CUDA hpc gpu rocm hip openmp heterogeneous-parallel-programming C++header-only tbb

C++ 389

21 天前

ROCm / rocBLAS

[DEPRECATED] Moved to ROCm/rocm-libraries repo

blas rocm hip

C++ 383

19 小时前

QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance p...

quantum-monte-carlo C++high-performance-computing quantum-chemistry CUDA gpu hpc mpi rocm oneapi

C++ 353

8 天前

ROCm / k8s-device-plugin

Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster

Kubernetes rocm

Go 334

14 小时前

agenium-scale / nsimd

Agenium Scale vectorization library for CPUs and GPUs

simd simd-programming sse2 sse42 avx avx2 avx512 neon aarch64 simd-instructions CUDA rocm C++hpc simd-library

C 333

4 年前

JuliaGPU / AMDGPU.jl

AMD GPU (ROCm) programming in Julia

Julia 语言 rocm amdgpu gpu gpu-programming

Julia 314

7 天前

ROCm / aomp

AOMP is an open source Clang/LLVM based compiler with added support for the OpenMP® API on Radeon™ GPUs. Use this repository for releases, issues, documentation, packaging, and examples.

amd LLVM clang openmp rocm

Fortran 226

20 小时前

LLNL / hiop

HPC solver for nonlinear optimization problems

hpc nonlinear-optimization nonlinear-programming interior-point-method parallel-programming mpi bfgs constrained-optimization solver optimization CUDA math-physics radiuss rocm

C++ 219

11 小时前