Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
#大语言模型#🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
Free software file format parser for Avid ProTools sessions
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Energinets Model Testbench. Automate gridcompliance studies in PSCAD and Powerfactory.
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
Set of examples written for hardware acceleration via TornadoVM
Inline PTX Assembly in CUDA example
Bloch's equations and Optimal Control for MRI and NMR applications
Visual Studio Code extension with PTX assembly syntax support
FastPtx: a python pTx pulse design tool for freely optimizing RF and gradient pulses with autodifferentiation