Loading

该仓库已收录但尚未编辑。项目介绍及使用教程请前往 GitHub 阅读 README


0 条讨论

登录后发表评论

关于

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

创建时间
是否国产

  修改时间

2024-08-13T09:45:30Z


语言

  • Python64.7%
  • Cuda26.7%
  • C++8.6%

SqueezeAILab 的其他开源项目

#自然语言处理#[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

Python1.76 k
1 年前

您可能感兴趣的

大模型Grok-1开源

Python50.53 k
1 年前

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python2.25 k
5 个月前
Python64 k
2 小时前

Open-Sora: 完全开源的高效复现类Sora视频生成方案

Python27.32 k
5 个月前

#大语言模型#LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python7.13 k
13 小时前

#自然语言处理#[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python385
1 年前

Training LLMs with QLoRA + FSDP

Jupyter Notebook1.53 k
1 年前

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python2.19 k
2 年前

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python3.28 k
3 个月前

#自然语言处理#LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python3.63 k
5 天前

#大语言模型#Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python12.46 k
5 天前

#大语言模型#SGLang is a fast serving framework for large language models and vision language models.

Python18.64 k
4 小时前

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM

Python168
1 年前

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Python1.61 k
1 年前

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python2.5 k
21 小时前

#大语言模型#HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Python2.43 k
2 个月前

An easy-to-use package for implementing SmoothQuant for LLMs

Python105
6 个月前

#大语言模型#Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook2.63 k
1 年前