#大语言模型#LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥
#计算机科学#Deep learning in Rust, with shape checked tensors and neural networks
Safe rust wrapper around CUDA toolkit
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
#计算机科学#Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your funct...
Some CUDA design patterns and a bit of template magic for CUDA
Spiking Neural Networks in C++ with strong GPU acceleration through CUDA
#算法刷题#CUDA kernel author's tools
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research
#计算机科学#Triton implementation of FlashAttention2 that adds Custom Masks.
#计算机科学#High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.