回车: Github搜索 Shift+回车: Google搜索

该仓库已收录但尚未编辑。项目介绍及使用教程请前往 GitHub 阅读 README

0 条讨论

登录后发表评论

关于

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

efficient-inference large-language-models 大语言模型 model-compression 自然语言处理 post-training-quantization quantization text-generation transformer llama localllm

创建时间

2023-06-12

是否国产

否

修改时间

2024-08-13T09:45:30Z

Readme

语言

Python64.7%
Cuda26.7%
C++8.6%

SqueezeAILab 的其他开源项目

LLMCompiler

@SqueezeAILab

#自然语言处理#[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

function-calling 大语言模型 llm-agent llm-agents

Python1.76 k

1 年前

您可能感兴趣的

grok-1

@xai-org

大模型Grok-1开源

Python50.53 k

1 年前

transformer-debugger

OpenAI@openai

Python4.1 k

1 年前

AutoAWQ存档

@casper-hansen

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python2.25 k

5 个月前

OpenHands

@All-Hands-AI

#大语言模型#🙌 OpenHands: Code Less, Make More

agent 人工智能大语言模型 ChatGPT claude-ai

Python64 k

2 小时前

Open-Sora

@hpcaitech

Open-Sora：完全开源的高效复现类Sora视频生成方案

Python27.32 k

5 个月前

lmdeploy

@InternLM

#大语言模型#LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

cuda-kernels deepspeed fastertransformer llm-inference turbomind

Python7.13 k

13 小时前

KVQuant

@SqueezeAILab

#自然语言处理#[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

compression efficient-inference efficient-model large-language-models llama

Python385

1 年前

fsdp_qlora

@AnswerDotAI

Training LLMs with QLoRA + FSDP

Jupyter Notebook1.53 k

1 年前

gptq

@IST-DASLab

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python2.19 k

2 年前

llm-awq

MIT HAN Lab@mit-han-lab

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python3.28 k

3 个月前

LightLLM

@ModelTC

#自然语言处理#LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

深度学习 gpt llama 大语言模型 model-serving

Python3.63 k

5 天前

gorilla

@ShishirPatil

#大语言模型#Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

API 大语言模型 api-documentation ChatGPT gpt-4-api

Python12.46 k

5 天前

sglang

@sgl-project

#大语言模型#SGLang is a fast serving framework for large language models and vision language models.

CUDA inference llama llava 大语言模型

Python18.64 k

4 小时前

GEAR

@opengear-project

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM

Python168

1 年前

GaLore

@jiaweizzhao

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Python1.61 k

1 年前

neural-compressor

Intel Corporation@intel

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

low-precision pruning sparsity auto-tuning knowledge-distillation

Python2.5 k

21 小时前

HuixiangDou

@InternLM

#大语言模型#HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

聊天机器人大语言模型 rag dsl lark

Python2.43 k

2 个月前

AutoSmoothQuant

@AniZpZ

An easy-to-use package for implementing SmoothQuant for LLMs

Python105

6 个月前

Medusa

@FasterDecoding

#大语言模型#Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

大语言模型 llm-inference

Jupyter Notebook2.63 k

1 年前

aici

Microsoft@microsoft

#大语言模型#AICI: Prompts as (Wasm) Programs

人工智能 Rust WebAssembly wasmtime inference

Rust2.05 k

8 个月前

SqueezeAILab / SqueezeLLM

自述文件

0 条讨论

关于

创建时间

是否国产

修改时间

语言

SqueezeAILab 的其他开源项目

您可能感兴趣的