GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

gemm

Website
Wikipedia
https://static.github-zh.com/github_avatars/OpenNMT?size=40
OpenNMT / CTranslate2

#计算机科学#Fast inference engine for Transformer models

neural-machine-translationC++mklquantizationCUDAthrustopennmt深度神经网络openmponednnintrinsicsavx2avxparallel-computinggemmneontransformer-modelsmachine-translation深度学习inference
C++ 3.86 k
2 个月前
https://static.github-zh.com/github_avatars/flame?size=40
flame / how-to-optimize-gemm

gemmmatrix-multiplicationblis
C 1.89 k
2 年前
https://static.github-zh.com/github_avatars/CNugteren?size=40
CNugteren / CLBlast

Tuned OpenCL BLAS

blasopenclblas-librariesmatrix-multiplicationgemmgpu
C++ 1.11 k
2 个月前
https://static.github-zh.com/github_avatars/flame?size=40
flame / blislab

BLISlab: A Sandbox for Optimizing GEMM

gemmmatrix-multiplicationblis
C 527
4 年前
https://static.github-zh.com/github_avatars/Bruce-Lee-LY?size=40
Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

CUDAgemmcublasNvidiagpu
Cuda 418
9 个月前
https://static.github-zh.com/github_avatars/yzhaiustc?size=40
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

CUDAgemmNvidiaoptimization
Cuda 355
5 个月前
https://static.github-zh.com/github_avatars/salykova?size=40
salykova / sgemm.c

Multi-Threaded FP32 Matrix Multiplication on x86 CPUs

Cgemmmatrix-multiplicationopenmpcpu
C 351
2 个月前
https://static.github-zh.com/github_avatars/mratsim?size=40
mratsim / laser

#计算机科学#The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats a...

high-performance-computing深度学习blasgemmconvolutionjitAssemblysimdopenmptensorparallelmatrix-multiplication
Nim 285
1 年前
https://static.github-zh.com/github_avatars/coderonion?size=40
coderonion / awesome-cuda-and-hpc

#大语言模型#🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

CUDAcublastensorrtAwesome Lists大语言模型gpublasPyTorchhpcgemmllamacudnntritontensorrt-llmcutlassmlirtvmdeepseekptxvlm
278
16 天前
https://static.github-zh.com/github_avatars/ROCm?size=40
ROCm / Tensile

#计算机科学#[DEPRECATED] Moved to ROCm/rocm-libraries repo

gemmblasdnnneural-networks机器学习tensorsPythonopenclhipauto-tuningamdgpu-computinggpu-accelerationgpumatrix-multiplicationAssembly
Python 244
2 天前
https://static.github-zh.com/github_avatars/yzhaiustc?size=40
yzhaiustc / Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.

blasgemmavx512simdmklopenmp
C 148
3 年前
https://static.github-zh.com/github_avatars/cp2k?size=40
cp2k / dbcsr

DBCSR: Distributed Block Compressed Sparse Row matrix library

blasmatrix-multiplicationgemmCUDAsparse-matrixmpihpclinear-algebra
Fortran 142
6 天前
https://static.github-zh.com/github_avatars/yui0?size=40
yui0 / slibs

Single file libraries for C/C++

Csingle-header-libaudioflacmp3gpgpumpegmp4m4aaacglslopenclgemmblasasciicodecencoder数学alsakms
C 122
23 天前
https://static.github-zh.com/github_avatars/ROCm?size=40
ROCm / hipBLASLt

#计算机科学#hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library

amdAssemblyblasgemmgpu-computinghip机器学习matrix-multiplicationrocm
Assembly 102
5 天前
https://static.github-zh.com/github_avatars/BoooC?size=40
BoooC / CNN-Accelerator-Based-on-Eyeriss-v2

A Flexible and Energy Efficient Accelerator For Sparse Convolution Neural Network

acceleratorconvolutional-neural-networks深度神经网络gemmsparse-matrix
Verilog 77
4 个月前
https://static.github-zh.com/github_avatars/enp1s0?size=40
enp1s0 / ozIMMU

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

CUDAgemm
Cuda 71
3 个月前
https://static.github-zh.com/github_avatars/aredden?size=40
aredden / torch-cublas-hgemm

PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu

CUDAfloat16gemmPyTorch
Cuda 69
6 个月前
https://static.github-zh.com/github_avatars/Bruce-Lee-LY?size=40
Bruce-Lee-LY / cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

cublasCUDAgemmgpuNvidia
Cuda 62
9 个月前
https://static.github-zh.com/github_avatars/CoffeeBeforeArch?size=40
CoffeeBeforeArch / mmul

Serial and parallel implementations of matrix multiplication

serialmatrix-multiplicationbenchmarksgemmparallel
C++ 41
4 年前
https://static.github-zh.com/github_avatars/andylolu2?size=40
andylolu2 / simpleGEMM

The simplest but fast implementation of matrix multiplication in CUDA.

CUDAgemmmatrix-multiplication
Cuda 35
1 年前
loading...