Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
#大语言模型#🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
Free software file format parser for Avid ProTools sessions
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Energinets Model Testbench. Automate gridcompliance studies in PSCAD and Powerfactory.
This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA kernels, Triton spells, and PTX sorcery.
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
Set of examples written for hardware acceleration via TornadoVM
Inline PTX Assembly in CUDA example
Bloch's equations and Optimal Control for MRI and NMR applications
FastPtx: a python pTx pulse design tool for freely optimizing RF and gradient pulses with autodifferentiation