speculative-decoding · GitHub Topics

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

大语言模型聊天机器人 4-bits llm-inference llm-cpu chatpdf streamingllm intel-optimized-llamacpp speculative-decoding habana rag retrieval

Python 2.17 k

10 个月前

aphrodite-engine / aphrodite-engine

#计算机科学#Large-scale LLM inference engine

API inference-engine 机器学习 CUDA inferentia rocm intel lora speculative-decoding tpu

C++ 1.49 k

2 天前

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

large-language-models llm-inference speculative-decoding

Python 1.44 k

3 天前

Infini-AI-Lab / Sequoia

#大语言模型#scalable and robust tree-based speculative decoding algorithm

efficiency inference 大语言模型 speculative-decoding

Python 351

6 个月前

facebookresearch / LayerSkip

#大语言模型#Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

大语言模型 optimization transformers speculative-decoding

Python 323

3 个月前

Infini-AI-Lab / TriForce

#大语言模型#[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

acceleration 大语言模型 long-context speculative-decoding llm-inference efficiency inference

Python 261

1 年前

FasterDecoding / REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024

llm-inference retrieval speculative-decoding

C 205

8 个月前

Infini-AI-Lab / UMbreLLa

LLM Inference on consumer devices

llm-inference offloading speculative-decoding

Python 123

4 个月前

bigai-nlco / TokenSwift

[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation

deepseek inference llm-inference llm-serving 大语言模型 qwen speculative-decoding transformer

Python 110

2 个月前

kssteven418 / BigLittleDecoder

#大语言模型#[NeurIPS'23] Speculative Decoding with Big Little Decoder

decoding efficient-inference 大语言模型 speculative-decoding

Python 93

1 年前

romsto / Speculative-Decoding

#大语言模型#Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

大语言模型 llm-inference speculative-decoding

Python 72

8 个月前

hemingkx / SWIFT

[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

speculative-decoding

Python 52

5 个月前

hemingkx / SpecDec

Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)

speculative-decoding non-autoregressive

Python 42

2 年前

BaohaoLiao / RSD

[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.

efficiency large-language-models reasoning speculative-decoding

Python 40

3 个月前

vladislavkruglikov / eagle

#自然语言处理#Pretty and simple to use implementation of speculative decoding algorithm eagle which is extrapolation algorithm for greater language model efficiency 🦅

大语言模型 llm-inference 自然语言处理 speculative-decoding

Python 36

17 天前

Tencent / AngelSlim

#大语言模型#Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

大语言模型 quantization speculative-decoding diffusion vlm

Python 31

4 天前

AutonomicPerfectionist / PipeInfer

#大语言模型#PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

inference llamacpp 大语言模型 speculative-decoding

C++ 30

9 个月前

hyx1999 / SAM-Decoding

Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton

speculative-decoding

Python 29

6 个月前

mscheong01 / speculative_decoding.c

#大语言模型#minimal C implementation of speculative decoding based on llama2.c

人工智能 C llama2 大语言模型 speculative-decoding

C 24

1 年前

jadohu / LANTERN

Official Implementation of LANTERN (ICLR'25) and LANTERN++(ICLRW-SCOPE'25)

speculative-decoding

Python 16

5 个月前