SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Ari...
#自然语言处理#Neural Network Compression Framework for enhanced OpenVINO™ inference
#计算机科学#TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
YOLO ModelCompression MultidatasetTraining
#计算机科学#Tutorial notebooks for hls4ml
#计算机科学#A model compression and acceleration toolbox based on pytorch.
#大语言模型#0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
#计算机科学#FrostNet: Towards Quantization-Aware Network Architecture Search
#计算机科学#Notes on quantization in neural networks
Quantization Aware Training
#计算机科学#Quantization-aware training with spiking neural networks
Train neural networks with joint quantization and pruning on both weights and activations using any pytorch modules
3rd place solution for NeurIPS 2019 MicroNet challenge
FakeQuantize with Learned Step Size(LSQ+) as Observer in PyTorch
Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'
Offical implementation of "Quantized Spike-driven Transformer" (ICLR2025)
QT-DOG: QUANTIZATION-AWARE TRAINING FOR DOMAIN GENERALIZATION