inference-optimization

High-efficiency floating-point neural network inference operators for mobile, server, and Web

neural-networks inference inference-optimization simd cpu multithreading matrix-multiplication convolutional-neural-networks convolutional-neural-network 神经网络 mobile-inference

C 2.11 k

18 小时前

alibaba / BladeDISC

#计算机科学#BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

编译器深度学习机器学习 PyTorch Tensorflow inference-optimization mlir 神经网络

C++ 894

9 个月前

jiazhihao / TASO

#计算机科学#The Tensor Algebra SuperOptimizer for Deep Learning

深度学习深度神经网络 inference-optimization

C++ 731

3 年前

bentoml / llm-inference-handbook

#大语言模型#Everything you need to know about LLM inference

大语言模型 llm-inference inference-optimization

TypeScript 229

8 天前

mit-han-lab / inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

inference-optimization cnn parallelism acceleration

C++ 201

3 年前

imedslab / pytorch_bn_fusion

#计算机科学#Batch normalization fusion for PyTorch. This is an archived repository, which is not maintained.

PyTorch inference-optimization 深度学习深度神经网络

Python 197

5 年前

ZFTurbo / Keras-inference-time-optimizer

Optimize layers structure of Keras model to reduce computation time

Keras inference-optimization

Python 157

5 年前

Rapternmn / PyTorch-Onnx-Tensorrt

A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3

tensorrt onnxruntime onnx PyTorch yolov3 inference-optimization darknet

Python 80

6 年前

BaiTheBest / SparseLLM

Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)

pruning inference-optimization large-language-models model-compression

Python 65

6 个月前

keli-wen / AGI-Study

#大语言模型#The blog, read report and code example for AGI/LLM related knowledge.

code-examples Demo inference-optimization 大语言模型

Python 45

7 个月前

vbdi / divprune

#大语言模型#[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

inference-optimization 大语言模型 multimodal-large-language-models pruning vision-language-model llava multi-modality

Python 42

4 个月前

ksm26 / Efficiently-Serving-LLMs

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Pred...

batch-processing inference-optimization machine-learning-operations model-inference-service model-serving text-generation

Jupyter Notebook 17

1 年前

lmaxwell / Armednn

cross-platform modular neural network inference library, small and efficient

inference-engine 神经网络 lstm inference-optimization

C++ 13

2 年前

grazder / template.cpp

#计算机科学#A template for getting started writing code using GGML

C++ggml 深度学习 inference-optimization

C++ 10

1 年前

ccs96307 / fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

acceleration inference-optimization large-language-models speculative-decoding

Python 10

2 个月前

Harly-1506 / Faster-Inference-yolov8

Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢

object-detection openvino segmentation yolov8 图像处理 inference-optimization numpy-arrays OpenCV torch ultralytics

Python 10

9 个月前

ResponsibleAILab / DAM

Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tun...

inference-optimization

Python 8

3 个月前

amazon-science / llm-rank-pruning

#大语言模型#LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.

graph-theory inference-optimization large-language-models 大语言模型 pruning

Python 7

10 个月前