GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

inference-optimization

Website
Wikipedia
https://static.github-zh.com/github_avatars/google?size=40
google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

neural-networksinferenceinference-optimizationsimdcpumultithreadingmatrix-multiplicationconvolutional-neural-networksconvolutional-neural-network神经网络mobile-inference
C 2.08 k
2 天前
https://static.github-zh.com/github_avatars/alibaba?size=40
alibaba / BladeDISC

#计算机科学#BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

编译器深度学习机器学习PyTorchTensorflowinference-optimizationmlir神经网络
C++ 886
7 个月前
https://static.github-zh.com/github_avatars/jiazhihao?size=40
jiazhihao / TASO

#计算机科学#The Tensor Algebra SuperOptimizer for Deep Learning

深度学习深度神经网络inference-optimization
C++ 726
3 年前
https://static.github-zh.com/github_avatars/bentoml?size=40
bentoml / llm-inference-handbook

#大语言模型#Everything you need to know about LLM inference

大语言模型llm-inferenceinference-optimization
TypeScript 204
5 天前
https://static.github-zh.com/github_avatars/mit-han-lab?size=40
mit-han-lab / inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

inference-optimizationcnnparallelismacceleration
C++ 200
3 年前
https://static.github-zh.com/github_avatars/imedslab?size=40
imedslab / pytorch_bn_fusion

#计算机科学#Batch normalization fusion for PyTorch. This is an archived repository, which is not maintained.

PyTorchinference-optimization深度学习深度神经网络
Python 197
5 年前
https://static.github-zh.com/github_avatars/ZFTurbo?size=40
ZFTurbo / Keras-inference-time-optimizer

Optimize layers structure of Keras model to reduce computation time

Kerasinference-optimization
Python 157
5 年前
https://static.github-zh.com/github_avatars/Rapternmn?size=40
Rapternmn / PyTorch-Onnx-Tensorrt

A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3

tensorrtonnxruntimeonnxPyTorchyolov3inference-optimizationdarknet
Python 80
6 年前
https://static.github-zh.com/github_avatars/BaiTheBest?size=40
BaiTheBest / SparseLLM

Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)

pruninginference-optimizationlarge-language-modelsmodel-compression
Python 64
4 个月前
https://static.github-zh.com/github_avatars/keli-wen?size=40
keli-wen / AGI-Study

#大语言模型#The blog, read report and code example for AGI/LLM related knowledge.

code-examplesDemoinference-optimization大语言模型
Python 40
6 个月前
https://static.github-zh.com/github_avatars/vbdi?size=40
vbdi / divprune

#大语言模型#[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

inference-optimization大语言模型multimodal-large-language-modelspruningvision-language-modelllavamulti-modality
Python 39
2 个月前
https://static.github-zh.com/github_avatars/ksm26?size=40
ksm26 / Efficiently-Serving-LLMs

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Pred...

batch-processinginference-optimizationmachine-learning-operationsmodel-inference-servicemodel-servingtext-generation
Jupyter Notebook 17
1 年前
https://static.github-zh.com/github_avatars/lmaxwell?size=40
lmaxwell / Armednn

cross-platform modular neural network inference library, small and efficient

inference-engine神经网络lstminference-optimization
C++ 13
2 年前
https://static.github-zh.com/github_avatars/ccs96307?size=40
ccs96307 / fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

accelerationinference-optimizationlarge-language-modelsspeculative-decoding
Python 10
1 个月前
https://static.github-zh.com/github_avatars/Harly-1506?size=40
Harly-1506 / Faster-Inference-yolov8

Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢

object-detectionopenvinosegmentationyolov8图像处理inference-optimizationnumpy-arraysOpenCVtorchultralytics
Python 10
8 个月前
https://static.github-zh.com/github_avatars/grazder?size=40
grazder / template.cpp

#计算机科学#A template for getting started writing code using GGML

C++ggml深度学习inference-optimization
C++ 9
1 年前
https://static.github-zh.com/github_avatars/amazon-science?size=40
amazon-science / llm-rank-pruning

#大语言模型#LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.

graph-theoryinference-optimizationlarge-language-models大语言模型pruning
Python 6
8 个月前
https://static.github-zh.com/github_avatars/EZ-Optimium?size=40
EZ-Optimium / Optimium

#计算机科学#Your AI Catalyst: inference backend to maximize your model's inference performance

amdarm深度学习inferenceinference-engineinference-optimizationintel神经网络runtimetensorflow-litemediapipe树莓派
C++ 5
8 个月前
https://static.github-zh.com/github_avatars/ResponsibleAILab?size=40
ResponsibleAILab / DAM

Dynamic Attention Mask (DAM) generate adaptive sparse attention masks per layer and head for Transformer models, enabling long-context inference with lower compute and memory overhead without fine-tun...

inference-optimization
Python 5
2 个月前
https://static.github-zh.com/github_avatars/effrosyni-papanastasiou?size=40
effrosyni-papanastasiou / constrained-em

A constrained expectation-maximization algorithm for feasible graph inference.

inference-optimization
Jupyter Notebook 4
4 年前
loading...