#

inference-optimization

https://static.github-zh.com/github_avatars/alibaba?size=40

#计算机科学#BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 894
9 个月前
https://static.github-zh.com/github_avatars/jiazhihao?size=40
C++ 731
3 年前
https://static.github-zh.com/github_avatars/bentoml?size=40
TypeScript 229
8 天前
https://static.github-zh.com/github_avatars/mit-han-lab?size=40

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

C++ 201
3 年前
https://static.github-zh.com/github_avatars/imedslab?size=40

#计算机科学#Batch normalization fusion for PyTorch. This is an archived repository, which is not maintained.

Python 197
5 年前
https://static.github-zh.com/github_avatars/ZFTurbo?size=40

Optimize layers structure of Keras model to reduce computation time

Python 157
5 年前
https://static.github-zh.com/github_avatars/Rapternmn?size=40

A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3

Python 80
6 年前
https://static.github-zh.com/github_avatars/BaiTheBest?size=40

Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)

Python 65
6 个月前
https://static.github-zh.com/github_avatars/keli-wen?size=40

#大语言模型#The blog, read report and code example for AGI/LLM related knowledge.

Python 45
7 个月前
https://static.github-zh.com/github_avatars/vbdi?size=40
Python 42
4 个月前
https://static.github-zh.com/github_avatars/ksm26?size=40

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Pred...

Jupyter Notebook 17
1 年前
https://static.github-zh.com/github_avatars/lmaxwell?size=40

cross-platform modular neural network inference library, small and efficient

C++ 13
2 年前
https://static.github-zh.com/github_avatars/grazder?size=40

#计算机科学#A template for getting started writing code using GGML

C++ 10
1 年前
https://static.github-zh.com/github_avatars/ccs96307?size=40

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

Python 10
2 个月前
https://static.github-zh.com/github_avatars/Harly-1506?size=40

Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢

Python 10
9 个月前
https://static.github-zh.com/github_avatars/ResponsibleAILab?size=40

Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tun...

Python 8
3 个月前
https://static.github-zh.com/github_avatars/amazon-science?size=40

#大语言模型#LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.

Python 7
10 个月前
https://static.github-zh.com/github_avatars/yester31?size=40

Optimizing Monocular Depth Estimation with TensorRT: Model Conversion, Inference Acceleration, and 3D Reconstruction

Python 6
21 天前
loading...
Website
Wikipedia