kv-cache · GitHub Topics

HDT3213 / godis

Go 语言实现的 Redis 服务器和分布式集群

kv-cache Go redis-server Redis godis cluster redis-cluster

Go 3.73 k

1 个月前

LMCache / LMCache

#大语言模型#Supercharge Your LLM with the Fastest KV Cache Layer

amd CUDA inference kv-cache 大语言模型 PyTorch rocm vllm fast speed

Python 3.72 k

21 小时前

MemTensor / MemOS

#大语言模型#MemOS (Preview) | Intelligence Begins with Memory

agent kv-cache language-model 大语言模型 lora memory Neo4j tree llm-memory long-term-memory memory-management rag retrieval-augmented-generation

Python 2.08 k

2 天前

Zefan-Cai / KVCache-Factory

#大语言模型#Unified KV Cache Compression Methods for Auto-Regressive Models

kv-cache 大语言模型

Python 1.22 k

7 个月前

harleyszhang / llm_note

#大语言模型#LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

大语言模型 llm-inference vllm cuda-programming kv-cache transformer-models

Python 805

15 小时前

therealoliver / Deepdive-llama3-from-scratch

Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.

inference kv-cache llama 大语言模型 attention attention-mechanism gpt language-model mask Parsing transformer

Jupyter Notebook 602

5 个月前

NVIDIA / kvpress

#大语言模型#LLM KV cache compression made easy

大语言模型 inference kv-cache long-context Python PyTorch transformers large-language-models

Python 560

2 天前

FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

gpt-3 high-throughput kv-cache large-language-models sparsity

Python 462

1 年前

dipampaul17 / KVSplit

#大语言模型#Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality lo...

apple-silicon generative-ai kv-cache llama-cpp 大语言模型 m1 m3 memory-optimization metal optimization quantization

Python 356

2 个月前

Zefan-Cai / Awesome-LLM-KV-Cache

#大语言模型#Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

kv-cache 大语言模型

342

5 个月前

raymin0223 / mixture_of_recursions

#大语言模型#Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

kv-cache 大语言模型 router

Python 332

3 天前

NVIDIA-Merlin / HierarchicalKV

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on hig...

CUDA gpu hashtable recommender-system key-value-store kv-cache

Cuda 163

2 天前

itsnamgyu / block-transformer

#大语言模型#Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

kv-cache 大语言模型 llm-inference

Python 160

4 个月前

kddubey / cappr

Completion After Prompt Probability. Make your LLM make a choice

text-classification zero-shot huggingface prompt-engineering llamacpp probability llm-inference kv-cache

Python 80

9 个月前

aju22 / LLaMA2

#自然语言处理#This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture ...

attention gpt kv-cache llama llama2 大语言模型自然语言处理 transformer

Python 69

2 年前