model-quantization · GitHub Topics

Efficient-ML / Awesome-Model-Quantization

#计算机科学#A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (pape...

深度学习 quantization Awesome Lists model-compression efficient-deep-learning model-quantization

2.17 k

5 个月前

horseee / Awesome-Efficient-LLM

#大语言模型#A curated list for Efficient Large Language Models

compression knowledge-distillation language-model 大语言模型 model-quantization efficient-llm

Python 1.8 k

1 个月前

datawhalechina / awesome-compression

模型压缩的小白入门教程，PDF下载地址 https://github.com/datawhalechina/awesome-compression/releases

knowledge-distillation model-compression model-pruning quantization compression model-quantization neural-architecture-search svd

309

2 个月前

inferflow / inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

llama2 llamacpp llm-inference model-quantization multi-gpu-inference mixture-of-experts moe gemma falcon minicpm mistral bloom deepseek internlm baichuan2 mixtral qwen

C++ 245

1 年前

Efficient-ML / Awesome-Efficient-AIGC

#大语言模型#A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcom...

aigc diffusion-models distillation efficient-deep-learning generative-model large-language-models 大语言模型 model-compression model-quantization pruning Awesome Lists

186

6 个月前

sayakpaul / Adventures-in-TensorFlow-Lite

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

tensorflow-2 tensorflow-lite on-device-ml model-quantization post-training-quantization quantization-aware-training pruning inference

Jupyter Notebook 171

3 年前

RodolfoFerro / psychopathology-fer-assistant

[WINNER! 🏆] Psychopathology FER Assistant. Because mental health matters. My project submission for #TFWorld TF 2.0 Challenge at Devpost.

Python 树莓派 google-colab model-quantization Tensorflow tflite firebase-realtime-database Flask dash

Jupyter Notebook 77

2 年前

htqin / BiBench

[ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarization.

benchmark model-compression binary-neural-networks model-quantization

Python 56

1 年前

htqin / QuantSR

[NeurIPS 2023 Spotlight] This project is the official implementation of our accepted NeurIPS 2023 (spotlight) paper QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution.

model-quantization super-resolution

Python 50

1 年前

nbasyl / OFQ

The official implementation of the ICML 2023 paper OFQ-ViT

icml model-compression model-quantization vision-transformer vision-transformers

Python 33

2 年前

seonglae / llama2gptq

#大语言模型#Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

langchain quantization transformers model-quantization CUDA 聊天机器人 question-answering ChatGPT gpt llama-2 llama2

Python 30

2 年前

HaoranREN / TensorFlow_Model_Quantization

#计算机科学#A tutorial of model quantization using TensorFlow

model-quantization Tensorflow tensorflow-lite tflite 机器学习 quantization-aware-training

Python 12

4 年前

cantbebetter2 / Awesome-Diffusion-Quantization

A list of papers, docs, codes about diffusion quantization.This repo collects various quantization methods for the Diffusion Models. Welcome to PR the works (papers, repositories) missed by the repo.

Awesome Lists diffusion-models model-compression model-quantization

13 天前

frickyinn / BiDense

PyTorch implementation of "BiDense: Binarization for Dense Prediction," A binary neural network for dense prediction tasks.

model-compression model-quantization

Python 6

8 个月前

dcarpintero / ai-engineering

AI Engineering: Annotated NBs to dive into Self-Attention, In-Context Learning, RAG, Knowledge-Graphs, Fine-Tuning, Model Optimization, and many more.

bert chunking embeddings fine-tuning generative-ai huggingface-transformers in-context-learning knowledge-graph langchain large-language-models llama3-1 model-quantization retrieval-augmented-generation self-attention transformer weights-and-biases ai-engineering

Jupyter Notebook 6

4 个月前

NANEXLABS / Nanex-AI

Enterprise multi-agent framework for secure, borderless data collaboration with zero-trust and federated learning-lightweight edge-ready.

人工智能 aiagent edge-computing federated-learning grpc-web iot-security model-quantization mqtt-protocol onnx-runtime tensorflow-lite zero-trust-security

Python 5

4 个月前

medoidai / model-quantization-blog-notebooks

#计算机科学#Notebook from "A Hands-On Walkthrough on Model Quantization" blog post.

人工智能深度学习机器学习 model-quantization

Jupyter Notebook 4

1 年前

SRDdev / Model-Quantization

Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (int8) instea...

机器学习 model-quantization quantization

Jupyter Notebook 4

2 年前

BjornMelin / local-llm-workbench

🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed...

cpu-inference CUDA gpu-acceleration inference-optimization llama-cpp local-llm model-management model-quantization

Shell 2

4 个月前

dwain-barnes / LLM-GGUF-Auto-Converter

#大语言模型#Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.

batch-processing CUDA gguf huggingface Jupyter Notebook llama-cpp 大语言模型 model-quantization

Jupyter Notebook 2

6 个月前