gptq · GitHub Topics

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

low-precision pruning sparsity auto-tuning knowledge-distillation quantization quantization-aware-training post-training-quantization smoothquant large-language-models gptq int8

Python 2.46 k

2 天前

ModelCloud / GPTQModel

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

gptq peft quantization transformers vllm

Python 703

14 天前

intel / auto-round

Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM. Export your models effortlessly to autogpt...

gptq quantization rounding

Python 552

3 天前

shm007g / LLaMA-Cult-and-More

#大语言模型#Large Language Models for All, 🦙 Cult and More, Stay in touch !

alpaca ChatGPT gpt llama ggml gpt4 gptq vicuna PyTorch Tensorflow transformers deepspeed 大语言模型

HTML 445

2 年前

bobazooba / xllm

#大语言模型#🦖 X—LLM: Cutting Edge & Easy LLM Finetuning

alpaca cerebras ChatGPT 深度学习深度神经网络 gpt gpt-4 gptq large-language-models llama llama2 大语言模型 mistral openai vicuna Zephyr RTOS PyTorch torch

Python 403

2 年前

1b5d / llm-api

#大语言模型#Run any Large Language Model behind a unified API

ChatGPT gptq huggingface langchain llama llamacpp 大语言模型 llm-inference 机器学习 Python

Python 169

2 年前

chenhunghan / ialacol

#大语言模型#🪶 Lightweight OpenAI drop-in replacement for Kubernetes

人工智能 helm Kubernetes langchain 大语言模型 Python openai cloudnative ggml gpu llamacpp CUDA gptq llm-inference llm-serving

Python 145

1 年前

abhinand5 / gptq_for_langchain

#大语言模型#A guide about how to use GPTQ models with langchain

人工智能 gpt gptq langchain language-model 大语言模型 quantization wizardlm

Jupyter Notebook 40

2 年前

ziwang-com / zero-lora

#大语言模型#zero零训练llm调参

gpt gptq llama 大语言模型 lora

2 年前

tripathiarpan20 / self-improvement-4all

Private self-improvement coaching with open-source LLMs

faiss langchain Python gptq transformers

Python 15

1 年前

hcd233 / Aris-AI-Model-Server

#大语言模型#An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

人工智能 embedding FastAPI gptq 大语言模型 MLX openai-compatible-api rag reranker sentence-transformers vllm

Python 15

6 天前

chinoll / chatsakura

#大语言模型#ChatSakura：Open-source multilingual conversational model.（开源多语言对话大模型）

gradio PyTorch bloom ChatGPT instruct-gpt 大语言模型 gptq transformers

Python 13

2 年前

taishan1994 / LLM-Quantization

#大语言模型#记录量化LLM中的总结。

gptq 大语言模型 quantization qwen3

Python 12

7 天前

seyf1elislam / LocalLLM_OneClick_Colab

#大语言模型#Run gguf LLM models in Latest Version TextGen-webui

colab-notebook gguf gptq 大语言模型 localllama localllm Python

Jupyter Notebook 12

10 个月前

matlok-ai / bampe-weights

#大语言模型#This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for ...

人工智能 blip2 foundational-models generative-ai gptq image-to-image 大语言模型 safetensors stable-diffusion tiff transformers blender blender-python 深度学习

Python 9

2 年前

Aqirito / A.L.I.C.E

#大语言模型#A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system

langchain langchain-python 大语言模型 text-generation text-to-speech tts vits Anime 人工智能 Genshin Impact waifu FastAPI gptq huggingface-transformers pygmalion REST API

Python 9

6 个月前

bobazooba / shurale

#自然语言处理#Conversation AI model for open domain dialogs

cerebras ChatGPT 深度学习深度神经网络 gpt gpt-4 gptq large-language-models llama llama2 大语言模型 mistral 自然语言处理 openai PyTorch torch transformers vicuna

Python 4

2 年前

SujanNeupane42 / NEPSE-Chatbot-Using-Retrieval-augmented-generation-and-reranking

#大语言模型#This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.

faiss Flask gptq langchain 大语言模型 Python retrieval-augmented-generation sentence-transformers vector-database

Jupyter Notebook 3

2 年前

upunaprosk / quantized-lm-confidence

#自然语言处理#Code for NAACL paper When Quantization Affects Confidence of Large Language Models?

compression gptq 自然语言处理 quantization efficient-model large-language-models 大语言模型

Jupyter Notebook 3

7 个月前

lpalbou / model-quantizer

#自然语言处理#Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

cross-platform gptq huggingface inference 大语言模型机器学习 model-compression 自然语言处理 optimization Python PyTorch quantization transformers

Python 2

5 个月前