Efficient Triton Kernels for LLM Training
#大语言模型#Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
A service for autodiscovery and configuration of applications running in containers
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
#数据仓库#🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applic...
FlagGems is an operator library for large language models implemented in the Triton Language.
#计算机科学#A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
Automatic ROPChain Generation
#大语言模型#🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
LLVM based static binary analysis framework
OpenDILab RL HPC OP Lib, including CUDA and Triton kernel
#计算机科学#A performance library for machine learning applications.
#计算机科学#NVIDIA-accelerated, deep learned model support for image space object detection
#计算机科学#ClearML - Model-Serving Orchestration and Repository Solution
(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of...