quantization · GitHub Topics

#自然语言处理#Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

fine-tuning llama 大语言模型 peft transformers rlhf qlora quantization qwen instruction-tuning gpt lora large-language-models agent 人工智能 moe llama3 deepseek gemma 自然语言处理

Python 55.27 k

20 小时前

ymcui / Chinese-LLaMA-Alpaca

#自然语言处理#中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

大语言模型 plm pre-trained-language-models alpaca llama 自然语言处理 quantization large-language-models lora alpaca-2 llama-2

Python 18.89 k

16 天前

SYSTRAN / faster-whisper

#计算机科学#Faster Whisper transcription with CTranslate2

深度学习 inference quantization speech-recognition speech-to-text transformer Whisper openai

Python 17.32 k

2 个月前

UFund-Me / Qbot

#区块链#[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant

funds 机器学习 pytrade quantitative-finance quantitative-trading quantization strategies trademarks quant-trader 比特币区块链深度学习 fintech backtest

Jupyter Notebook 12.42 k

25 天前

bitsandbytes-foundation / bitsandbytes

#大语言模型#Accessible large language models via k-bit quantization for PyTorch.

大语言模型机器学习 PyTorch qlora quantization

Python 7.42 k

1 天前

kornelski / pngquant

Lossy PNG compressor — pngquant command based on libimagequant library

pngquant Code quality png png-compression quantization stdin palette conversion image-optimization C

C 5.42 k

24 天前

AutoGPTQ / AutoGPTQ

#自然语言处理#An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

transformers 深度学习 inference large-language-models 大语言模型自然语言处理 PyTorch quantization transformer

Python 4.91 k

4 个月前

IntelLabs / distiller

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

PyTorch pruning quantization Jupyter Notebook 深度神经网络 regularization distillation onnx

Jupyter Notebook 4.4 k

2 年前

OpenNMT / CTranslate2

#计算机科学#Fast inference engine for Transformer models

neural-machine-translation C++mkl quantization CUDA thrust opennmt 深度神经网络 openmp onednn intrinsics avx2 avx parallel-computing gemm neon transformer-models machine-translation 深度学习 inference

C++ 3.93 k

4 个月前

neuralmagic / deepsparse

#自然语言处理#Sparsity-aware deep learning inference runtime for CPUs

机器学习 onnx inference 机器视觉 object-detection pruning quantization pretrained-models 自然语言处理 cpus sparsification llm-inference performance

Python 3.16 k

2 个月前

huawei-noah / Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

knowledge-distillation model-compression quantization pretrained-models

Python 3.12 k

2 年前

huggingface / optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

onnx PyTorch inference training intel graphcore onnxruntime transformers quantization habana optimization tflite

Python 3 k

18 小时前

IntelLabs / nlp-architect

#自然语言处理#A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

深度学习自然语言处理 nlu Tensorflow dynet PyTorch bert transformers quantization

Python 2.94 k

3 年前

aaron-xichen / pytorch-playground

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

pytorch-tutorial pytorch-tutorials PyTorch quantization

Python 2.68 k

3 年前

stochasticai / xTuring

#大语言模型#Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJ...

深度学习 fine-tuning gpt-2 gpt-j llama 大语言模型 lora language-model alpaca finetuning adapter gen-ai generative-ai mistral peft quantization

Python 2.66 k

10 个月前

nunchaku-tech / nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

diffusion-models flux genai lora mlsys quantization iclr iclr2025 comfyui

Python 2.51 k

1 天前

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

low-precision pruning sparsity auto-tuning knowledge-distillation quantization quantization-aware-training post-training-quantization smoothquant large-language-models gptq int8

Python 2.46 k

2 天前

quic / aimet

#计算机科学#AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

quantization 深度学习 compression Open Source 机器学习 pruning auto-ml 深度神经网络

Python 2.38 k

2 天前

dvmazur / mixtral-offloading

#大语言模型#Run Mixtral-8x7B models in Colab or consumer desktops

colab-notebook 深度学习 google-colab language-model 大语言模型 mixture-of-experts offloading PyTorch quantization

Python 2.32 k

1 年前

666DZY666 / micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Ari...

quantization pruning dorefa twn bnn xnor-net PyTorch model-compression group-convolution convolutional-networks quantization-aware-training post-training-quantization tensorrt onnx

Python 2.26 k

3 个月前